NEWS

vimp 2.3.3 (2023-08-28)

Major changes

Add clustered bootstrap and associated unit tests

Minor changes

Update software author list
Fix roxygen2 CRAN bug for package documentation

vimp 2.3.2

Major changes

Fixed bugs introduced in 2.3.1 for final_point_estimate = "average"

vimp 2.3.1 (2022-12-09)

Major changes

In cases where sample-splitting is used (which is required for valid inference under the null hypothesis of zero variable importance), there is now the option to report a point estimate that is based on the entire dataset, rather than only the split on which inference (confidence intervals and p-values) is performed. The point estimator (using either the single split, the full dataset, or the average of the two split-specific point estimates) is valid regardless of whether the null holds or not. If this option is chosen, there may be a discrepancy between the point estimate and the interval estimate; this is likely to occur only in small-sample (or small effective sample-size, for binary outcomes) settings.

Minor changes

For predictiveness measures that lie in [0, 1] by definition (accuracy, ANOVA, R-squared, deviance, AUC), the default is now to compute confidence intervals on the logit scale, which guarantees that the interval will also lie in [0, 1]. Note that this means the interval will not be centered at the point estimate; however, it retains the desired level of coverage.

vimp 2.3.0 (2022-11-14)

Major changes

Predictiveness measures now have their own S3 class, which makes internal code cleaner and facilitates simpler addition of new predictiveness measures.
In this version, the default return value of extract_sampled_split_predictions is a vector, not a list. This facilitates proper use in the new version of the package.

Minor changes

You can now specify truncate = FALSE in vimp_ci

vimp 2.2.11

Major changes

You can now compute variable importance using the average value under the optimal treatment rule. This includes functions measure_avg_value (computes the average value and efficient influence function) and updates to vim, cv_vim, and sp_vim.

Minor changes

None

vimp 2.2.10

Major changes

None

Minor changes

Specify method and family for weighted EIF estimation within outer functions (vim, cv_vim, sp_vim) rather than the measure* functions. This allows compatibility for binary outcomes.
Added a vignette for coarsened-data settings.

vimp 2.2.9

Major changes

None

Minor changes

Allow for unequal numbers of cross-fitting folds between full and reduced predictiveness

vimp 2.2.8

Major changes

None

Minor changes

Return objects in sp_vim that are necessary to compute the test statistics

vimp 2.2.7

Major changes

None

Minor changes

Allow parallel argument to be specified for calls to CV.SuperLearner but not for calls to SuperLearner

vimp 2.2.6

Major changes

None

Minor changes

Allow different types of bootstrap interval (e.g., percentile) to be computed
More precise documentation for Z in coarsened-data settings; allow case-insensitive specification of covariate names/positions when creating Z
V defaults to 5 if no cross-fitting folds are specified externally
More precise documentation for cross_fitted_f1 and cross_fitted_f2 in cv_vim
Allow non-list cross_fitted_f1 and cross_fitted_f2 in cv_vim

vimp 2.2.5 (2021-08-16)

Major changes

None

Minor changes

Update how cv_vim handles an odd number of outer folds being passed with pre-computed regression function estimates. Now, you can use an odd number of folds (e.g., 5) to estimate the full and reduced regression functions and still obtain cross-validated variable importance estimates.

vimp 2.2.4 (2021-08-04)

Major changes

None

Minor changes

Allow for odd number of folds in cross-fit and sampled-split VIM estimation
Add vrc01 data as an exported object
Change dataset for vignettes to vrc01 data

vimp 2.2.3 (2021-07-20)

Major changes

Updated computation of standard errors. Some of the changes in v2.2.0 (namely, that the efficient influence function can be estimated on the entire dataset regardless of whether or not sample-splitting was requested) do not match with the form of the standard error estimator that we use. In this update, we ensure that independent data are used to estimate both the predictiveness and the efficient influence function; however, the nuisance functions may still be estimated on a larger portion of the data than in versions prior to v2.2.0 when cross-fitting is used.

Minor changes

Added explicit-value tests for point estimates throughout testthat/
Harmonized vignettes with new SE computation
Allow C to not be specified in make_folds

vimp 2.2.2 (2021-06-14)

Major changes

None

Minor changes

Increased tolerance for AUC vs CV-AUC

vimp 2.2.1 (2021-06-03)

Major changes

Updated the internals of measure_auc to hew more closely to ROCR and cvAUC, using computational tricks to speed up weighted AUC and EIF computation.

Minor changes

Added tests for IPW AUC

vimp 2.2.0

Major changes

Added argument cross_fitted_se to cv_vim and sp_vim; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions.
Added bootstrap-based standard error estimates as an option to both vim and cv_vim; currently, this option is only available for non-sampled-split calls (i.e., with sample_splitting = FALSE)
Updated sample-splitting behavior to match more closely with theoretical results (and improve power!): namely, that since estimation of the nuisance regression functions (i.e., the regression of outcome on all covariates and outcome on the reduced set of covariates) can be treated as fixed in making inference, sample-splitting is only necessary for evaluating predictiveness. Thus, the final regression functions from a call to vim are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full and predictiveness_reduced, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions.
Added argument sample_splitting to vim, cv_vim and sp_vim; if FALSE, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default, TRUE, in all cases, since inference using sample_splitting = FALSE will be invalid for variables with truly null variable importance.
Updated cross-fitting (also referred to as cross-validation) behavior within sample_splitting = TRUE to match more closely with theoretical results (and improve power!). In this case, we first split the data into $2K$ cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each $k \in {1, \ldots, K}$ we set aside the data in sample-splitting fold 1 and cross-fitting fold $k$ [this comprises $1 / (2K)$ of the data]. We train using the remaining observations [comprising $(2K-1)/(2K)$ of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects both cv_vim and sp_vim. If sample_splitting = FALSE, then we use standard cross-fitting.

Minor changes

Use >= in computing the numerator of AUC with inverse probability weights
Update roxygen2 documentation for wrappers (vimp_*) to inherit parameters and details from cv_vim (reduces potential for documentation mismatches)

vimp 2.1.10

Major changes

None

Minor changes

Automatically determine the family if it isn't specified; use stats::binomial() if there are only two unique outcome values, otherwise use stats::gaussian()

vimp 2.1.9 (2021-03-01)

Major changes

None

Minor changes

Update sensitivity and specificity to use weak inequalities rather than strict inequalities (better aligns with cvAUC)
Add a test of CV-AUC estimation against cvAUC
Borrow information across folds for empirically estimated quantities (e.g., the outcome variance or probability of a certain class); asymptotically equivalent to the prior procedure, but could result in small-sample differences
Use fold-specific EIFs for cross-validated SE estimation (again, asymptotically equivalent to the prior procedure, but could result in small-sample differences)

vimp 2.1.8

Major changes

None

Minor changes

Allow the user to specify either an augmented inverse probability of coarsening (AIPW, the default) estimator in coarsened-at-random settings, or specify an IPW estimator, using new argument ipc_est_type (available in vim, cv_vim, and sp_vim; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)

vimp 2.1.7

Major changes

None

Minor changes

Updated internals so that stratified estimation can be performed in outer regression functions for binary outcomes, but that in the case of two-phase samples the stratification won't be used in any internal regressions with continuous outcomes
Updated internals to allow stratification on both the outcome and observed status, so that there are sufficient cases per fold for both the phase 1 and phase 2 regressions (only used with two-phase samples)

vimp 2.1.6 (2021-01-09)

Major changes

None

Minor changes

Updated links to DOIs and package vignettes throughout
Updated all tests in testthat/ to use glm rather than xgboost (increases speed)
Updated all examples to use glm rather than xgboost or ranger (increases speed, even though the regression is now misspecified for the truth)
Removed forcats from vignette

vimp 2.1.5

Major changes

None

Minor changes

Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the matrix of fold-specific EIFs had the wrong number of rows
Changes to internals of measure_accuracy and measure_auc for project-wide consistency
Update all tests in testthat/ to not explicitly load xgboost

vimp 2.1.4

Major changes

None

Minor changes

Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the EIF had the wrong number of rows

vimp 2.1.3

Major changes

None

Minor changes

Compute logit transforms using stats::qlogis and stats::plogis rather than bespoke functions

vimp 2.1.2

Major changes

None

Minor changes

Bugfix from 2.1.1.1: compute the correction correctly

vimp 2.1.1.1

Major changes

None

Minor changes

Allow confidence interval (CI) and inverse probability of coarsening corrections on different scales (e.g., log) to ensure that estimates and CIs lie in the parameter space

vimp 2.1.1

Major changes

Compute one-step estimators of variable importance if inverse probability of censoring weights are entered. You input the weights, indicator of coarsening, and observed variables, and vimp will handle the rest.

Minor changes

Created new vignettes "Types of VIMs" and "Using precomputed regression function estimates in vimp"
Updated main vignette to only use run_regression = TRUE for simplicity
Added argument verbose to sp_vim; if TRUE, messages are printed throughout fitting that display progress and verbose is passed to SuperLearner
Change names of internal functions from cv_predictiveness_point_est and predictiveness_point_est to est_predictiveness_cv and est_predictiveness, respectively
Removed functions cv_predictiveness_update, cv_vimp_point_est, cv_vimp_update, predictiveness_update, vimp_point_est, vimp_update; this functionality is now in est_predictiveness_cv and est_predictiveness (for the *update* functions) or directly in vim or cv_vim (for the *vimp* functions)
Removed functions predictiveness_se and predictiveness_ci (functionality is now in vimp_se and vimp_ci, respectively)
Changed weights argument to ipc_weights, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weights

vimp 2.1.0 (2020-06-18)

Major changes

Added functions sp_vim, sample_subsets, spvim_ics, spvim_se; these allow computation of Shapely Population Variable Importance (SPVIM)

Minor changes

None

vimp 2.0.2 (2020-04-27)

Major changes

Removed functions sp_vim and helper functions run_sl, sample_subsets, spvim_ics, spvim_se; these will be added in a future release
Removed function cv_vim_nodonsker, since cv_vim supersedes this function

Minor changes

Modify examples to pass all CRAN checks

vimp 2.0.1 (2020-04-11)

Major changes

Added new function sp_vim and helper functions run_sl, sample_subsets, spvim_ics, spvim_se; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM)
Both cv_vim and vim now use an outer layer of sample splitting for hypothesis testing
Added new functions vimp_auc, vimp_accuracy, vimp_deviance, vimp_rsquared
vimp_regression is now deprecated; use vimp_anova instead
added new function vim; each variable importance function is now a wrapper function around vim with the type argument filled in
cv_vim_nodonsker is now deprecated; use cv_vim instead
each variable importance function now returns a p-value based on the (possibly conservative) hypothesis test against the null of zero importance (with the exception of vimp_anova)
each variable importance function now returns the estimates of the individual risks (with the exception of vimp_anova)
added new functions to compute measures of predictiveness (and cross-validated measures of predictiveness), along with their influence functions

Minor changes

Return tibbles in cv_vim, vim, merge_vim, and average_vim

vimp 1.1.6 (2019-08-26)

Major changes

None

Minor changes

Changed tests to handle gam package update by switching library to SL.xgboost, SL.step, and SL.mean
Added small unit tests for internal functions

vimp 1.1.5 (2019-08-09)

Major changes

None

Minor changes

Attempt to handle gam package update in unit tests

vimp 1.1.4 (2018-10-14)

Major changes

None

Minor changes

cv_vim andcv_vim_nodonsker now return the cross-validation folds used within the function

vimp 1.1.3 (2018-10-02)

Major changes

None

Minor changes

users may now only specify a family for the top-level SuperLearner if run_regression = TRUE; in call cases, the second-stage SuperLearner uses a gaussian family
if the SuperLearner chooses SL.mean as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted values

vimp 1.1.2 (2018-09-20)

Major changes

added function cv_vim_nodonsker, which computes the cross-validated naive estimator and the update on the same, single, validation fold. This does not allow for relaxation of the Donsker class conditions.

Minor changes

None

vimp 1.1.1

Major changes

added function two_validation_set_cv, which sets up folds for V-fold cross-validation with two validation sets per fold
changed the functionality of cv_vim: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created from two_validation_set_cv); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)

Minor changes

None

vimp 1.1.0 (2018-08-09)

Major changes

None

Minor changes

changed the functionality of cv_vim: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator

vimp 1.0.0 (2018-06-24)

Major changes

removed function vim, replaced with individual-parameter functions
added function vimp_regression to match Python package
cv_vim now can compute regression estimators
renamed all internal functions; these are now vimp_ci, vimp_se, vimp_update, onestep_based_estimator
edited vignette
added unit tests