NEWS
vimp 2.3.3 (2023-08-28)
Major changes
- Add clustered bootstrap and associated unit tests
Minor changes
- Update software author list
- Fix roxygen2 CRAN bug for package documentation
vimp 2.3.2
Major changes
- Fixed bugs introduced in 2.3.1 for
final_point_estimate = "average"
vimp 2.3.1 (2022-12-09)
Major changes
- In cases where sample-splitting is used (which is required for valid inference under the null hypothesis of zero variable importance), there is now the option to report a point estimate that is based on the entire dataset, rather than only the split on which inference (confidence intervals and p-values) is performed. The point estimator (using either the single split, the full dataset, or the average of the two split-specific point estimates) is valid regardless of whether the null holds or not. If this option is chosen, there may be a discrepancy between the point estimate and the interval estimate; this is likely to occur only in small-sample (or small effective sample-size, for binary outcomes) settings.
Minor changes
- For predictiveness measures that lie in [0, 1] by definition (accuracy, ANOVA, R-squared, deviance, AUC), the default is now to compute confidence intervals on the logit scale, which guarantees that the interval will also lie in [0, 1]. Note that this means the interval will not be centered at the point estimate; however, it retains the desired level of coverage.
vimp 2.3.0 (2022-11-14)
Major changes
- Predictiveness measures now have their own
S3
class, which makes internal code cleaner and facilitates simpler addition of new predictiveness measures.
- In this version, the default return value of
extract_sampled_split_predictions
is a vector, not a list. This facilitates proper use in the new version of the package.
Minor changes
- You can now specify
truncate = FALSE
in vimp_ci
vimp 2.2.11
Major changes
- You can now compute variable importance using the average value under the optimal treatment rule. This includes functions
measure_avg_value
(computes the average value and efficient influence function) and updates to vim
, cv_vim
, and sp_vim
.
Minor changes
vimp 2.2.10
Major changes
Minor changes
- Specify
method
and family
for weighted EIF estimation within outer functions (vim
, cv_vim
, sp_vim
) rather than the measure*
functions. This allows compatibility for binary outcomes.
- Added a vignette for coarsened-data settings.
vimp 2.2.9
Major changes
Minor changes
- Allow for unequal numbers of cross-fitting folds between full and reduced predictiveness
vimp 2.2.8
Major changes
Minor changes
- Return objects in
sp_vim
that are necessary to compute the test statistics
vimp 2.2.7
Major changes
Minor changes
- Allow
parallel
argument to be specified for calls to CV.SuperLearner
but not for calls to SuperLearner
vimp 2.2.6
Major changes
Minor changes
- Allow different types of bootstrap interval (e.g., percentile) to be computed
- More precise documentation for
Z
in coarsened-data settings; allow case-insensitive specification of covariate names/positions when creating Z
V
defaults to 5 if no cross-fitting folds are specified externally
- More precise documentation for
cross_fitted_f1
and cross_fitted_f2
in cv_vim
- Allow non-list
cross_fitted_f1
and cross_fitted_f2
in cv_vim
vimp 2.2.5 (2021-08-16)
Major changes
Minor changes
- Update how
cv_vim
handles an odd number of outer folds being passed with pre-computed regression function estimates. Now, you can use an odd number of folds (e.g., 5) to estimate the full and reduced regression functions and still obtain cross-validated variable importance estimates.
vimp 2.2.4 (2021-08-04)
Major changes
Minor changes
- Allow for odd number of folds in cross-fit and sampled-split VIM estimation
- Add
vrc01
data as an exported object
- Change dataset for vignettes to
vrc01
data
vimp 2.2.3 (2021-07-20)
Major changes
- Updated computation of standard errors. Some of the changes in v2.2.0 (namely, that the efficient influence function can be estimated on the entire dataset regardless of whether or not sample-splitting was requested) do not match with the form of the standard error estimator that we use. In this update, we ensure that independent data are used to estimate both the predictiveness and the efficient influence function; however, the nuisance functions may still be estimated on a larger portion of the data than in versions prior to v2.2.0 when cross-fitting is used.
Minor changes
- Added explicit-value tests for point estimates throughout testthat/
- Harmonized vignettes with new SE computation
- Allow
C
to not be specified in make_folds
vimp 2.2.2 (2021-06-14)
Major changes
None
Minor changes
- Increased tolerance for AUC vs CV-AUC
vimp 2.2.1 (2021-06-03)
Major changes
- Updated the internals of
measure_auc
to hew more closely to ROCR
and cvAUC
, using computational tricks to speed up weighted AUC and EIF computation.
Minor changes
vimp 2.2.0
Major changes
- Added argument
cross_fitted_se
to cv_vim
and sp_vim
; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions.
- Added bootstrap-based standard error estimates as an option to both
vim
and cv_vim
; currently, this option is only available for non-sampled-split calls (i.e., with sample_splitting = FALSE
)
- Updated sample-splitting behavior to match more closely with theoretical results (and improve power!): namely, that since estimation of the nuisance regression functions (i.e., the regression of outcome on all covariates and outcome on the reduced set of covariates) can be treated as fixed in making inference, sample-splitting is only necessary for evaluating predictiveness. Thus, the final regression functions from a call to
vim
are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full
and predictiveness_reduced
, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions.
- Added argument
sample_splitting
to vim
, cv_vim
and sp_vim
; if FALSE
, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default, TRUE
, in all cases, since inference using sample_splitting = FALSE
will be invalid for variables with truly null variable importance.
- Updated cross-fitting (also referred to as cross-validation) behavior within
sample_splitting = TRUE
to match more closely with theoretical results (and improve power!). In this case, we first split the data into $2K$ cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each $k \in {1, \ldots, K}$ we set aside the data in sample-splitting fold 1 and cross-fitting fold $k$ [this comprises $1 / (2K)$ of the data]. We train using the remaining observations [comprising $(2K-1)/(2K)$ of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects both cv_vim
and sp_vim
. If sample_splitting = FALSE
, then we use standard cross-fitting.
Minor changes
- Use
>=
in computing the numerator of AUC with inverse probability weights
- Update
roxygen2
documentation for wrappers (vimp_*
) to inherit parameters and details from cv_vim
(reduces potential for documentation mismatches)
vimp 2.1.10
Major changes
None
Minor changes
- Automatically determine the
family
if it isn't specified; use stats::binomial()
if there are only two unique outcome values, otherwise use stats::gaussian()
vimp 2.1.9 (2021-03-01)
Major changes
None
Minor changes
- Update sensitivity and specificity to use weak inequalities rather than strict inequalities (better aligns with
cvAUC
)
- Add a test of CV-AUC estimation against
cvAUC
- Borrow information across folds for empirically estimated quantities (e.g., the outcome variance or probability of a certain class); asymptotically equivalent to the prior procedure, but could result in small-sample differences
- Use fold-specific EIFs for cross-validated SE estimation (again, asymptotically equivalent to the prior procedure, but could result in small-sample differences)
vimp 2.1.8
Major changes
None
Minor changes
- Allow the user to specify either an augmented inverse probability of coarsening (AIPW, the default) estimator in coarsened-at-random settings, or specify an IPW estimator, using new argument
ipc_est_type
(available in vim
, cv_vim
, and sp_vim
; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)
vimp 2.1.7
Major changes
None
Minor changes
- Updated internals so that stratified estimation can be performed in outer regression functions for binary outcomes, but that in the case of two-phase samples the stratification won't be used in any internal regressions with continuous outcomes
- Updated internals to allow stratification on both the outcome and observed status, so that there are sufficient cases per fold for both the phase 1 and phase 2 regressions (only used with two-phase samples)
vimp 2.1.6 (2021-01-09)
Major changes
None
Minor changes
- Updated links to DOIs and package vignettes throughout
- Updated all tests in
testthat/
to use glm
rather than xgboost
(increases speed)
- Updated all examples to use
glm
rather than xgboost
or ranger
(increases speed, even though the regression is now misspecified for the truth)
- Removed
forcats
from vignette
vimp 2.1.5
Major changes
None
Minor changes
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the matrix of fold-specific EIFs had the wrong number of rows
- Changes to internals of
measure_accuracy
and measure_auc
for project-wide consistency
- Update all tests in
testthat/
to not explicitly load xgboost
vimp 2.1.4
Major changes
None
Minor changes
- Fixed a bug where if the number of rows in the different folds (for cross-fitting or sample-splitting) differed, the EIF had the wrong number of rows
vimp 2.1.3
Major changes
None
Minor changes
- Compute logit transforms using
stats::qlogis
and stats::plogis
rather than bespoke functions
vimp 2.1.2
Major changes
None
Minor changes
- Bugfix from 2.1.1.1: compute the correction correctly
vimp 2.1.1.1
Major changes
None
Minor changes
- Allow confidence interval (CI) and inverse probability of coarsening corrections on different scales (e.g., log) to ensure that estimates and CIs lie in the parameter space
vimp 2.1.1
Major changes
- Compute one-step estimators of variable importance if inverse probability of censoring weights are entered. You input the weights, indicator of coarsening, and observed variables, and
vimp
will handle the rest.
Minor changes
- Created new vignettes "Types of VIMs" and "Using precomputed regression function estimates in
vimp
"
- Updated main vignette to only use
run_regression = TRUE
for simplicity
- Added argument
verbose
to sp_vim
; if TRUE
, messages are printed throughout fitting that display progress and verbose
is passed to SuperLearner
- Change names of internal functions from
cv_predictiveness_point_est
and predictiveness_point_est
to est_predictiveness_cv
and est_predictiveness
, respectively
- Removed functions
cv_predictiveness_update
, cv_vimp_point_est
, cv_vimp_update
, predictiveness_update
, vimp_point_est
, vimp_update
; this functionality is now in est_predictiveness_cv
and est_predictiveness
(for the *update*
functions) or directly in vim
or cv_vim
(for the *vimp*
functions)
- Removed functions
predictiveness_se
and predictiveness_ci
(functionality is now in vimp_se
and vimp_ci
, respectively)
- Changed
weights
argument to ipc_weights
, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weights
vimp 2.1.0 (2020-06-18)
Major changes
Added functions sp_vim
, sample_subsets
, spvim_ics
, spvim_se
; these allow computation of Shapely Population Variable Importance (SPVIM)
Minor changes
None
vimp 2.0.2 (2020-04-27)
Major changes
- Removed functions
sp_vim
and helper functions run_sl
, sample_subsets
, spvim_ics
, spvim_se
; these will be added in a future release
- Removed function
cv_vim_nodonsker
, since cv_vim
supersedes this function
Minor changes
- Modify examples to pass all CRAN checks
vimp 2.0.1 (2020-04-11)
Major changes
- Added new function
sp_vim
and helper functions run_sl
, sample_subsets
, spvim_ics
, spvim_se
; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM)
- Both
cv_vim
and vim
now use an outer layer of sample splitting for hypothesis testing
- Added new functions
vimp_auc
, vimp_accuracy
, vimp_deviance
, vimp_rsquared
vimp_regression
is now deprecated; use vimp_anova
instead
- added new function
vim
; each variable importance function is now a wrapper function around vim
with the type
argument filled in
cv_vim_nodonsker
is now deprecated; use cv_vim
instead
- each variable importance function now returns a p-value based on the (possibly conservative) hypothesis test against the null of zero importance (with the exception of
vimp_anova
)
- each variable importance function now returns the estimates of the individual risks (with the exception of
vimp_anova
)
- added new functions to compute measures of predictiveness (and cross-validated measures of predictiveness), along with their influence functions
Minor changes
- Return tibbles in cv_vim, vim, merge_vim, and average_vim
vimp 1.1.6 (2019-08-26)
Major changes
None
Minor changes
- Changed tests to handle
gam
package update by switching library to SL.xgboost
, SL.step
, and SL.mean
- Added small unit tests for internal functions
vimp 1.1.5 (2019-08-09)
Major changes
None
Minor changes
- Attempt to handle
gam
package update in unit tests
vimp 1.1.4 (2018-10-14)
Major changes
None
Minor changes
cv_vim
andcv_vim_nodonsker
now return the cross-validation folds used within the function
vimp 1.1.3 (2018-10-02)
Major changes
None
Minor changes
- users may now only specify a
family
for the top-level SuperLearner if run_regression = TRUE
; in call cases, the second-stage SuperLearner uses a gaussian
family
- if the SuperLearner chooses
SL.mean
as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted values
vimp 1.1.2 (2018-09-20)
Major changes
- added function
cv_vim_nodonsker
, which computes the cross-validated naive estimator and the update on the same, single, validation fold. This does not allow for relaxation of the Donsker class conditions.
Minor changes
None
vimp 1.1.1
Major changes
- added function
two_validation_set_cv
, which sets up folds for V-fold cross-validation with two validation sets per fold
- changed the functionality of
cv_vim
: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created from two_validation_set_cv
); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)
Minor changes
None
vimp 1.1.0 (2018-08-09)
Major changes
None
Minor changes
- changed the functionality of
cv_vim
: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator
vimp 1.0.0 (2018-06-24)
Major changes
- removed function
vim
, replaced with individual-parameter functions
- added function
vimp_regression
to match Python package
cv_vim
now can compute regression estimators
- renamed all internal functions; these are now
vimp_ci
, vimp_se
, vimp_update
, onestep_based_estimator
- edited vignette
- added unit tests
vimp 0.0.3
Major changes
None
Minor changes
Bugfixes etc.