NEWS
vimp 2.3.8
Major changes
Compute standard errors using the cross-product of the EIFs by default, which handles correlated data. The previous behavior (using the square of the EIFs, which is valid for independent data) can be obtained using the argument use_square = TRUE.
Also handle clustered data in sample splitting and cross fitting, by ensuring that all observations for a given cluster are in the same fold.
Minor changes
None
vimp 2.3.7
Major changes
- Allow different cutoffs to be used for PPV, NPV, sensitivity, and specificity of the full and reduced models
Minor changes
None
vimp 2.3.6 (2025-08-28)
Major changes
- Update the way that PPV, NPV, sensitivity, and specificity are calculated so that they return meaningful answers if all predictions are the same (for example, if using the mean outcome value to predict)
Minor changes
- Add tests for PPV, NPV, sensitivity, specificity
vimp 2.3.5 (2025-07-23)
Major changes
None
Minor changes
- Update reference pages to pass new CRAN checks for links and items
vimp 2.3.4
Major changes
None
Minor changes
vimp 2.3.3 (2023-08-28)
Major changes
- Add clustered bootstrap and associated unit tests
Minor changes
- Update software author list
- Fix roxygen2 CRAN bug for package documentation
vimp 2.3.2
Major changes
- Fixed bugs introduced in 2.3.1 for
final_point_estimate = "average"
vimp 2.3.1 (2022-12-09)
Major changes
- In cases where sample-splitting is used (which is required for valid inference under the null hypothesis of zero variable importance), there is now the option to report a point estimate that is based on the entire dataset, rather than only the split on which inference (confidence intervals and p-values) is performed. The point estimator (using either the single split, the full dataset, or the average of the two split-specific point estimates) is valid regardless of whether the null holds or not. If this option is chosen, there may be a discrepancy between the point estimate and the interval estimate; this is likely to occur only in small-sample (or small effective sample-size, for binary outcomes) settings.
Minor changes
- For predictiveness measures that lie in [0, 1] by definition (accuracy, ANOVA, R-squared, deviance, AUC), the default is now to compute confidence intervals on the logit scale, which guarantees that the interval will also lie in [0, 1]. Note that this means the interval will not be centered at the point estimate; however, it retains the desired level of coverage.
vimp 2.3.0 (2022-11-14)
Major changes
- Predictiveness measures now have their own
S3 class, which makes internal code cleaner and facilitates simpler addition of new predictiveness measures.
- In this version, the default return value of
extract_sampled_split_predictions is a vector, not a list. This facilitates proper use in the new version of the package.
Minor changes
- You can now specify
truncate = FALSE in vimp_ci
vimp 2.2.11
Major changes
- You can now compute variable importance using the average value under the optimal treatment rule. This includes functions
measure_avg_value (computes the average value and efficient influence function) and updates to vim, cv_vim, and sp_vim.
Minor changes
vimp 2.2.10
Major changes
Minor changes
- Specify
method and family for weighted EIF estimation within outer functions (vim, cv_vim, sp_vim) rather than the measure* functions. This allows compatibility for binary outcomes.
- Added a vignette for coarsened-data settings.
vimp 2.2.9
Major changes
Minor changes
- Allow for unequal numbers of cross-fitting folds between full and reduced predictiveness
vimp 2.2.8
Major changes
Minor changes
- Return objects in
sp_vim that are necessary to compute the test statistics
vimp 2.2.7
Major changes
Minor changes
- Allow
parallel argument to be specified for calls to CV.SuperLearner but not for calls to SuperLearner
vimp 2.2.6
Major changes
Minor changes
- Allow different types of bootstrap interval (e.g., percentile) to be computed
- More precise documentation for
Z in coarsened-data settings; allow case-insensitive specification of covariate names/positions when creating Z
V defaults to 5 if no cross-fitting folds are specified externally
- More precise documentation for
cross_fitted_f1 and cross_fitted_f2 in cv_vim
- Allow non-list
cross_fitted_f1 and cross_fitted_f2 in cv_vim
vimp 2.2.5 (2021-08-16)
Major changes
Minor changes
- Update how
cv_vim handles an odd number of outer folds being passed with pre-computed regression function estimates. Now, you can use an odd number of folds (e.g., 5) to estimate the full and reduced regression functions and still obtain cross-validated variable importance estimates.
vimp 2.2.4 (2021-08-04)
Major changes
Minor changes
- Allow for odd number of folds in cross-fit and sampled-split VIM estimation
- Add
vrc01 data as an exported object
- Change dataset for vignettes to
vrc01 data
vimp 2.2.3 (2021-07-20)
Major changes
- Updated computation of standard errors. Some of the changes in v2.2.0 (namely, that the efficient influence function can be estimated on the entire dataset regardless of whether or not sample-splitting was requested) do not match with the form of the standard error estimator that we use. In this update, we ensure that independent data are used to estimate both the predictiveness and the efficient influence function; however, the nuisance functions may still be estimated on a larger portion of the data than in versions prior to v2.2.0 when cross-fitting is used.
Minor changes
- Added explicit-value tests for point estimates throughout testthat/
- Harmonized vignettes with new SE computation
- Allow
C to not be specified in make_folds
vimp 2.2.2 (2021-06-14)
Major changes
None
Minor changes
- Increased tolerance for AUC vs CV-AUC
vimp 2.2.1 (2021-06-03)
Major changes
- Updated the internals of
measure_auc to hew more closely to ROCR and cvAUC, using computational tricks to speed up weighted AUC and EIF computation.
Minor changes
vimp 2.2.0
Major changes
- Added argument
cross_fitted_se to cv_vim and sp_vim; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions.
- Added bootstrap-based standard error estimates as an option to both
vim and cv_vim; currently, this option is only available for non-sampled-split calls (i.e., with sample_splitting = FALSE)
- Updated sample-splitting behavior to match more closely with theoretical results (and improve power!): namely, that since estimation of the nuisance regression functions (i.e., the regression of outcome on all covariates and outcome on the reduced set of covariates) can be treated as fixed in making inference, sample-splitting is only necessary for evaluating predictiveness. Thus, the final regression functions from a call to
vim are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full and predictiveness_reduced, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions.
- Added argument
sample_splitting to vim, cv_vim and sp_vim; if FALSE, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default, TRUE, in all cases, since inference using sample_splitting = FALSE will be invalid for variables with truly null variable importance.
- Updated cross-fitting (also referred to as cross-validation) behavior within
sample_splitting = TRUE to match more closely with theoretical results (and improve power!). In this case, we first split the data into $2K$ cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each $k \in {1, \ldots, K}$ we set aside the data in sample-splitting fold 1 and cross-fitting fold $k$ [this comprises $1 / (2K)$ of the data]. We train using the remaining observations [comprising $(2K-1)/(2K)$ of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects both cv_vim and sp_vim. If sample_splitting = FALSE, then we use standard cross-fitting.
Minor changes
- Use
>= in computing the numerator of AUC with inverse probability weights
- Update
roxygen2 documentation for wrappers (vimp_*) to inherit parameters and details from cv_vim (reduces potential for documentation mismatches)
vimp 2.1.10
Major changes
None
Minor changes
- Automatically determine the
family if it isn't specified; use stats::binomial() if there are only two unique outcome values, otherwise use stats::gaussian()
vimp 2.1.9 (2021-03-01)
Major changes
None
Minor changes
- Update sensitivity and specificity to use weak inequalities rather than strict inequalities (better aligns with
cvAUC)
- Add a test of CV-AUC estimation against
cvAUC
- Borrow information across folds for empirically estimated quantities (e.g., the outcome variance or probability of a certain class); asymptotically equivalent to the prior procedure, but could result in small-sample differences
- Use fold-specific EIFs for cross-validated SE estimation (again, asymptotically equivalent to the prior procedure, but could result in small-sample differences)
vimp 2.1.8
Major changes
None
Minor changes
- Allow the user to specify either an augmented inverse probability of coarsening (AIPW, the default) estimator in coarsened-at-random settings, or specify an IPW estimator, using new argument
ipc_est_type (available in vim, cv_vim, and sp_vim; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)
vimp 2.1.7
Major changes
None
Minor changes
- Updated internals so that stratified estimation can be performed in outer regression functions for binary outcomes, but that in the case of two-phase samples the stratification won't be used in any internal regressions with continuous outcomes
- Updated internals to allow stratification on both the outcome and observed status, so that there are sufficient cases per fold for both the phase 1 and phase 2 regressions (only used with two-phase samples)
vimp 2.1.6 (2021-01-09)
Major changes
None
Minor changes
- Updated links to DOIs and package vignettes throughout
- Updated all tests in
testthat/ to use glm rather than xgboost (increases speed)
- Updated all examples to use
glm rather than xgboost or ranger (increases speed, even though the regression is now misspecified for the truth)
- Removed
forcats from vignette
vimp 2.1.5
Major changes
None
Minor changes
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the matrix of fold-specific EIFs had the wrong number of rows
- Changes to internals of
measure_accuracy and measure_auc for project-wide consistency
- Update all tests in
testthat/ to not explicitly load xgboost
vimp 2.1.4
Major changes
None
Minor changes
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the EIF had the wrong number of rows
vimp 2.1.3
Major changes
None
Minor changes
- Compute logit transforms using
stats::qlogis and stats::plogis rather than bespoke functions
vimp 2.1.2
Major changes
None
Minor changes
- Bugfix from 2.1.1.1: compute the correction correctly
vimp 2.1.1.1
Major changes
None
Minor changes
- Allow confidence interval (CI) and inverse probability of coarsening corrections on different scales (e.g., log) to ensure that estimates and CIs lie in the parameter space
vimp 2.1.1
Major changes
- Compute one-step estimators of variable importance if inverse probability of censoring weights are entered. You input the weights, indicator of coarsening, and observed variables, and
vimp will handle the rest.
Minor changes
- Created new vignettes "Types of VIMs" and "Using precomputed regression function estimates in
vimp"
- Updated main vignette to only use
run_regression = TRUE for simplicity
- Added argument
verbose to sp_vim; if TRUE, messages are printed throughout fitting that display progress and verbose is passed to SuperLearner
- Change names of internal functions from
cv_predictiveness_point_est and predictiveness_point_est to est_predictiveness_cv and est_predictiveness, respectively
- Removed functions
cv_predictiveness_update, cv_vimp_point_est, cv_vimp_update, predictiveness_update, vimp_point_est, vimp_update; this functionality is now in est_predictiveness_cv and est_predictiveness (for the *update* functions) or directly in vim or cv_vim (for the *vimp* functions)
- Removed functions
predictiveness_se and predictiveness_ci (functionality is now in vimp_se and vimp_ci, respectively)
- Changed
weights argument to ipc_weights, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weights
vimp 2.1.0 (2020-06-18)
Major changes
Added functions sp_vim, sample_subsets, spvim_ics, spvim_se; these allow computation of Shapely Population Variable Importance (SPVIM)
Minor changes
None
vimp 2.0.2 (2020-04-27)
Major changes
- Removed functions
sp_vim and helper functions run_sl, sample_subsets, spvim_ics, spvim_se; these will be added in a future release
- Removed function
cv_vim_nodonsker, since cv_vim supersedes this function
Minor changes
- Modify examples to pass all CRAN checks
vimp 2.0.1 (2020-04-11)
Major changes
- Added new function
sp_vim and helper functions run_sl, sample_subsets, spvim_ics, spvim_se; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM)
- Both
cv_vim and vim now use an outer layer of sample splitting for hypothesis testing
- Added new functions
vimp_auc, vimp_accuracy, vimp_deviance, vimp_rsquared
vimp_regression is now deprecated; use vimp_anova instead
- added new function
vim; each variable importance function is now a wrapper function around vim with the type argument filled in
cv_vim_nodonsker is now deprecated; use cv_vim instead
- each variable importance function now returns a p-value based on the (possibly conservative) hypothesis test against the null of zero importance (with the exception of
vimp_anova)
- each variable importance function now returns the estimates of the individual risks (with the exception of
vimp_anova)
- added new functions to compute measures of predictiveness (and cross-validated measures of predictiveness), along with their influence functions
Minor changes
- Return tibbles in cv_vim, vim, merge_vim, and average_vim
vimp 1.1.6 (2019-08-26)
Major changes
None
Minor changes
- Changed tests to handle
gam package update by switching library to SL.xgboost, SL.step, and SL.mean
- Added small unit tests for internal functions
vimp 1.1.5 (2019-08-09)
Major changes
None
Minor changes
- Attempt to handle
gam package update in unit tests
vimp 1.1.4 (2018-10-14)
Major changes
None
Minor changes
cv_vim andcv_vim_nodonsker now return the cross-validation folds used within the function
vimp 1.1.3 (2018-10-02)
Major changes
None
Minor changes
- users may now only specify a
family for the top-level SuperLearner if run_regression = TRUE; in call cases, the second-stage SuperLearner uses a gaussian family
- if the SuperLearner chooses
SL.mean as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted values
vimp 1.1.2 (2018-09-20)
Major changes
- added function
cv_vim_nodonsker, which computes the cross-validated naive estimator and the update on the same, single, validation fold. This does not allow for relaxation of the Donsker class conditions.
Minor changes
None
vimp 1.1.1
Major changes
- added function
two_validation_set_cv, which sets up folds for V-fold cross-validation with two validation sets per fold
- changed the functionality of
cv_vim: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created from two_validation_set_cv); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)
Minor changes
None
vimp 1.1.0 (2018-08-09)
Major changes
None
Minor changes
- changed the functionality of
cv_vim: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator
vimp 1.0.0 (2018-06-24)
Major changes
- removed function
vim, replaced with individual-parameter functions
- added function
vimp_regression to match Python package
cv_vim now can compute regression estimators
- renamed all internal functions; these are now
vimp_ci, vimp_se, vimp_update, onestep_based_estimator
- edited vignette
- added unit tests
vimp 0.0.3
Major changes
None
Minor changes
Bugfixes etc.