Title: | Perform Inference on Algorithm-Agnostic Variable Importance |
---|---|
Description: | Calculate point estimates of and valid confidence intervals for nonparametric, algorithm-agnostic variable importance measures in high and low dimensions, using flexible estimators of the underlying regression functions. For more information about the methods, please see Williamson et al. (Biometrics, 2020), Williamson et al. (JASA, 2021), and Williamson and Feng (ICML, 2020). |
Authors: | Brian D. Williamson [aut, cre]
|
Maintainer: | Brian D. Williamson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.3.4 |
Built: | 2025-02-12 20:24:10 UTC |
Source: | https://github.com/bdwilliamson/vimp |
Average the output from multiple calls to vimp_regression
, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.
average_vim(..., weights = rep(1/length(list(...)), length(list(...))))
average_vim(..., weights = rep(1/length(list(...)), length(list(...))))
... |
an arbitrary number of |
weights |
how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean |
an object of class vim
containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval.
This results in a list containing:
s - a list of the column(s) to calculate variable importance for
SL.library - a list of the libraries of learners passed to SuperLearner
full_fit - a list of the fitted values of the chosen method fit to the full data
red_fit - a list of the fitted values of the chosen method fit to the reduced data
est- a vector with the corrected estimates
naive- a vector with the naive estimates
update- a list with the influence curve-based updates
mat - a matrix with the estimated variable importance, the standard error, and the % confidence interval
full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)
red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)
alpha - the level, for confidence interval calculation
y - a list of the outcomes
# generate the data p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # get estimates on independent splits of the data samp <- sample(1:n, n/2, replace = FALSE) # using Super Learner (with a small number of folds, for illustration only) est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))
# generate the data p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # get estimates on independent splits of the data samp <- sample(1:n, n/2, replace = FALSE) # using Super Learner (with a small number of folds, for illustration only) est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))
Compute bootstrap-based standard error estimates for variable importance
bootstrap_se( Y = NULL, f1 = NULL, f2 = NULL, cluster_id = NULL, clustered = FALSE, type = "r_squared", b = 1000, boot_interval_type = "perc", alpha = 0.05 )
bootstrap_se( Y = NULL, f1 = NULL, f2 = NULL, cluster_id = NULL, clustered = FALSE, type = "r_squared", b = 1000, boot_interval_type = "perc", alpha = 0.05 )
Y |
the outcome. |
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. A vector of the same length as |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
cluster_id |
vector of the same length as |
clustered |
should the bootstrap resamples be performed on clusters
rather than individual observations? Defaults to |
type |
the type of importance to compute; defaults to
|
b |
the number of bootstrap replicates (only used if |
boot_interval_type |
the type of bootstrap interval (one of |
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
a bootstrap-based standard error estimate
Check pre-computed fitted values for call to vim, cv_vim, or sp_vim
Check pre-computed fitted values for call to vim, cv_vim, or sp_vim
check_fitted_values( Y = NULL, f1 = NULL, f2 = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, sample_splitting_folds = NULL, cross_fitting_folds = NULL, cross_fitted_se = TRUE, V = NULL, ss_V = NULL, cv = FALSE ) check_fitted_values( Y = NULL, f1 = NULL, f2 = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, sample_splitting_folds = NULL, cross_fitting_folds = NULL, cross_fitted_se = TRUE, V = NULL, ss_V = NULL, cv = FALSE )
check_fitted_values( Y = NULL, f1 = NULL, f2 = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, sample_splitting_folds = NULL, cross_fitting_folds = NULL, cross_fitted_se = TRUE, V = NULL, ss_V = NULL, cv = FALSE ) check_fitted_values( Y = NULL, f1 = NULL, f2 = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, sample_splitting_folds = NULL, cross_fitting_folds = NULL, cross_fitted_se = TRUE, V = NULL, ss_V = NULL, cv = FALSE )
Y |
the outcome |
f1 |
estimator of the population-optimal prediction function using all covariates |
f2 |
estimator of the population-optimal prediction function using the reduced set of covariates |
cross_fitted_f1 |
cross-fitted estimator of the population-optimal prediction function using all covariates |
cross_fitted_f2 |
cross-fitted estimator of the population-optimal prediction function using the reduced set of covariates |
sample_splitting_folds |
the folds for sample-splitting (used for hypothesis testing) |
cross_fitting_folds |
the folds for cross-fitting (used for point
estimates of variable importance in |
cross_fitted_se |
logical; should cross-fitting be used to estimate standard errors? |
V |
the number of cross-fitting folds |
ss_V |
the number of folds for CV (if sample_splitting is TRUE) |
cv |
a logical flag indicating whether or not to use cross-fitting |
Ensure that inputs to vim
, cv_vim
, and sp_vim
follow the correct formats.
Ensure that inputs to vim
, cv_vim
, and sp_vim
follow the correct formats.
None. Called for the side effect of stopping the algorithm if any inputs are in an unexpected format.
None. Called for the side effect of stopping the algorithm if any inputs are in an unexpected format.
Check inputs to a call to vim, cv_vim, or sp_vim
Check inputs to a call to vim, cv_vim, or sp_vim
check_inputs(Y, X, f1, f2, indx) check_inputs(Y, X, f1, f2, indx)
check_inputs(Y, X, f1, f2, indx) check_inputs(Y, X, f1, f2, indx)
Y |
the outcome |
X |
the covariates |
f1 |
estimator of the population-optimal prediction function using all covariates |
f2 |
estimator of the population-optimal prediction function using the reduced set of covariates |
indx |
the index or indices of the covariate(s) of interest |
Ensure that inputs to vim
, cv_vim
, and sp_vim
follow the correct formats.
Ensure that inputs to vim
, cv_vim
, and sp_vim
follow the correct formats.
None. Called for the side effect of stopping the algorithm if any inputs are in an unexpected format.
None. Called for the side effect of stopping the algorithm if any inputs are in an unexpected format.
Create complete-case outcome, weights, and Z
Create complete-case outcome, weights, and Z
create_z(Y, C, Z, X, ipc_weights) create_z(Y, C, Z, X, ipc_weights)
create_z(Y, C, Z, X, ipc_weights) create_z(Y, C, Z, X, ipc_weights)
Y |
the outcome |
C |
indicator of missing or observed |
Z |
the covariates observed in phase 1 and 2 data |
X |
all covariates |
ipc_weights |
the weights |
a list, with the complete-case outcome, weights, and Z matrix
a list, with the complete-case outcome, weights, and Z matrix
Compute estimates and confidence intervals using cross-fitting for nonparametric intrinsic variable importance based on the population-level contrast between the oracle predictiveness using the feature(s) of interest versus not.
cv_vim( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = ifelse(is.null(cross_fitting_folds), 5, length(unique(cross_fitting_folds))), sample_splitting = TRUE, final_point_estimate = "split", sample_splitting_folds = NULL, cross_fitting_folds = NULL, stratified = FALSE, type = "r_squared", run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, scale = "identity", na.rm = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale_est = TRUE, nuisance_estimators_full = NULL, nuisance_estimators_reduced = NULL, exposure_name = NULL, cross_fitted_se = TRUE, bootstrap = FALSE, b = 1000, boot_interval_type = "perc", clustered = FALSE, cluster_id = rep(NA, length(Y)), ... )
cv_vim( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = ifelse(is.null(cross_fitting_folds), 5, length(unique(cross_fitting_folds))), sample_splitting = TRUE, final_point_estimate = "split", sample_splitting_folds = NULL, cross_fitting_folds = NULL, stratified = FALSE, type = "r_squared", run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, scale = "identity", na.rm = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale_est = TRUE, nuisance_estimators_full = NULL, nuisance_estimators_reduced = NULL, exposure_name = NULL, cross_fitted_se = TRUE, bootstrap = FALSE, b = 1000, boot_interval_type = "perc", clustered = FALSE, cluster_id = rep(NA, length(Y)), ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. If sample-splitting is requested, then these must be
estimated specially; see Details. If |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
sample_splitting |
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
type |
the type of importance to compute; defaults to
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_scale |
what scale should the inverse probability weight correction be applied on (if any)? Defaults to "identity". (other options are "log" and "logit") |
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
nuisance_estimators_full |
(only used if |
nuisance_estimators_reduced |
(only used if |
exposure_name |
(only used if |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
bootstrap |
should bootstrap-based standard error estimates be computed?
Defaults to |
b |
the number of bootstrap replicates (only used if |
boot_interval_type |
the type of bootstrap interval (one of |
clustered |
should the bootstrap resamples be performed on clusters
rather than individual observations? Defaults to |
cluster_id |
vector of the same length as |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution.
Cross-fitted VIM estimates are computed differently if sample-splitting
is requested versus if it is not. We recommend using sample-splitting
in most cases, since only in this case will inferences be valid if
the variable(s) of interest have truly zero population importance.
The purpose of cross-fitting is to estimate and
on independent data from estimating
; this can result in improved
performance, especially when using flexible learning algorithms. The purpose
of sample-splitting is to estimate
and
on independent
data; this allows valid inference under the null hypothesis of zero importance.
Without sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
With sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds. These folds are further divided
into 2 groups of folds. Then, for each fold
in the first group,
estimator
of
is constructed using all data besides
the kth fold in the group (i.e.,
of the data) and
estimator
of
is constructed using the held-out data
(i.e.,
of the data); then, computing
Similarly, for each fold in the second group,
estimator
of
is constructed using all data
besides the kth fold in the group (i.e.,
of the data)
and estimator
of
is constructed using the held-out
data (i.e.,
of the data); then, computing
Finally,
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind the cv_vim
function, and the
validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the fitted values of the chosen method fit to the full data (a list, for train and test data)
the fitted values of the chosen method fit to the reduced data (a list, for train and test data)
the estimated variable importance
the naive estimator of variable importance
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for hypothesis testing
the folds used for cross-fitting
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of class vim
. See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
n <- 100 p <- 2 # generate the data x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- as.matrix(smooth + stats::rnorm(n, 0, 1)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # ----------------------------------------- # using Super Learner (with a small number of folds, for illustration only) # ----------------------------------------- set.seed(4747) est <- cv_vim(Y = y, X = x, indx = 2, V = 2, type = "r_squared", run_regression = TRUE, SL.library = learners, cvControl = list(V = 2), alpha = 0.05) # ------------------------------------------ # doing things by hand, and plugging them in # (with a small number of folds, for illustration only) # ------------------------------------------ # set up the folds indx <- 2 V <- 2 Y <- matrix(y) set.seed(4747) # Note that the CV.SuperLearner should be run with an outer layer # of 2*V folds (for V-fold cross-fitted importance) full_cv_fit <- suppressWarnings(SuperLearner::CV.SuperLearner( Y = Y, X = x, SL.library = learners, cvControl = list(V = 2 * V), innerCvControl = list(list(V = V)) )) full_cv_preds <- full_cv_fit$SL.predict # use the same cross-fitting folds for reduced reduced_cv_fit <- suppressWarnings(SuperLearner::CV.SuperLearner( Y = Y, X = x[, -indx, drop = FALSE], SL.library = learners, cvControl = SuperLearner::SuperLearner.CV.control( V = 2 * V, validRows = full_cv_fit$folds ), innerCvControl = list(list(V = V)) )) reduced_cv_preds <- reduced_cv_fit$SL.predict # for hypothesis testing cross_fitting_folds <- get_cv_sl_folds(full_cv_fit$folds) set.seed(1234) sample_splitting_folds <- make_folds(unique(cross_fitting_folds), V = 2) set.seed(5678) est <- cv_vim(Y = y, cross_fitted_f1 = full_cv_preds, cross_fitted_f2 = reduced_cv_preds, indx = 2, delta = 0, V = V, type = "r_squared", cross_fitting_folds = cross_fitting_folds, sample_splitting_folds = sample_splitting_folds, run_regression = FALSE, alpha = 0.05, na.rm = TRUE)
n <- 100 p <- 2 # generate the data x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- as.matrix(smooth + stats::rnorm(n, 0, 1)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # ----------------------------------------- # using Super Learner (with a small number of folds, for illustration only) # ----------------------------------------- set.seed(4747) est <- cv_vim(Y = y, X = x, indx = 2, V = 2, type = "r_squared", run_regression = TRUE, SL.library = learners, cvControl = list(V = 2), alpha = 0.05) # ------------------------------------------ # doing things by hand, and plugging them in # (with a small number of folds, for illustration only) # ------------------------------------------ # set up the folds indx <- 2 V <- 2 Y <- matrix(y) set.seed(4747) # Note that the CV.SuperLearner should be run with an outer layer # of 2*V folds (for V-fold cross-fitted importance) full_cv_fit <- suppressWarnings(SuperLearner::CV.SuperLearner( Y = Y, X = x, SL.library = learners, cvControl = list(V = 2 * V), innerCvControl = list(list(V = V)) )) full_cv_preds <- full_cv_fit$SL.predict # use the same cross-fitting folds for reduced reduced_cv_fit <- suppressWarnings(SuperLearner::CV.SuperLearner( Y = Y, X = x[, -indx, drop = FALSE], SL.library = learners, cvControl = SuperLearner::SuperLearner.CV.control( V = 2 * V, validRows = full_cv_fit$folds ), innerCvControl = list(list(V = V)) )) reduced_cv_preds <- reduced_cv_fit$SL.predict # for hypothesis testing cross_fitting_folds <- get_cv_sl_folds(full_cv_fit$folds) set.seed(1234) sample_splitting_folds <- make_folds(unique(cross_fitting_folds), V = 2) set.seed(5678) est <- cv_vim(Y = y, cross_fitted_f1 = full_cv_preds, cross_fitted_f2 = reduced_cv_preds, indx = 2, delta = 0, V = V, type = "r_squared", cross_fitting_folds = cross_fitting_folds, sample_splitting_folds = sample_splitting_folds, run_regression = FALSE, alpha = 0.05, na.rm = TRUE)
Compute nonparametric estimates of the chosen measure of predictiveness.
est_predictiveness( fitted_values, y, a = NULL, full_y = NULL, type = "r_squared", C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(C)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(C)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, ... )
est_predictiveness( fitted_values, y, a = NULL, full_y = NULL, type = "r_squared", C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(C)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(C)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data. |
y |
the observed outcome. |
a |
the observed treatment assignment (may be within a specified fold,
for cross-fitted estimates). Only used if |
full_y |
the observed outcome (from the entire dataset, for cross-fitted estimates). |
type |
which parameter are you estimating (defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should NA's be removed in computation?
(defaults to |
nuisance_estimators |
(only used if |
... |
other arguments to SuperLearner, if |
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest.
A list, with: the estimated predictiveness; the estimated efficient influence function; and the predictions of the EIF based on inverse probability of censoring.
Compute nonparametric estimates of the chosen measure of predictiveness.
est_predictiveness_cv( fitted_values, y, full_y = NULL, folds, type = "r_squared", C = rep(1, length(y)), Z = NULL, folds_Z = folds, ipc_weights = rep(1, length(C)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(C)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, ... )
est_predictiveness_cv( fitted_values, y, full_y = NULL, folds, type = "r_squared", C = rep(1, length(y)), Z = NULL, folds_Z = folds, ipc_weights = rep(1, length(C)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(C)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, ... )
fitted_values |
fitted values from a regression function using the
observed data; a list of length V, where each object is a set of
predictions on the validation data, or a vector of the same length as |
y |
the observed outcome. |
full_y |
the observed outcome (from the entire dataset, for cross-fitted estimates). |
folds |
the cross-validation folds for the observed data. |
type |
which parameter are you estimating (defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
folds_Z |
either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z. |
ipc_weights |
weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should NA's be removed in computation?
(defaults to |
... |
other arguments to SuperLearner, if |
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind this function and the definition of the
parameter of interest. If sample-splitting is also requested
(recommended, since in this case inferences
will be valid even if the variable has zero true importance), then the
prediction functions are trained as if -fold cross-validation were run,
but are evaluated on only
sets (independent between the full and
reduced nuisance regression).
The estimated measure of predictiveness.
Generic function for estimating a predictiveness measure (e.g., R-squared or classification accuracy).
estimate(x, ...)
estimate(x, ...)
x |
An R object. Currently, there are methods for |
... |
further arguments passed to or from other methods. |
Estimate projection of EIF on fully-observed variables
Estimate projection of EIF on fully-observed variables
estimate_eif_projection( obs_grad = NULL, C = NULL, Z = NULL, ipc_fit_type = NULL, ipc_eif_preds = NULL, ... ) estimate_eif_projection( obs_grad = NULL, C = NULL, Z = NULL, ipc_fit_type = NULL, ipc_eif_preds = NULL, ... )
estimate_eif_projection( obs_grad = NULL, C = NULL, Z = NULL, ipc_fit_type = NULL, ipc_eif_preds = NULL, ... ) estimate_eif_projection( obs_grad = NULL, C = NULL, Z = NULL, ipc_fit_type = NULL, ipc_eif_preds = NULL, ... )
obs_grad |
the estimated (observed) EIF |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
... |
other arguments to SuperLearner, if |
the projection of the EIF onto the fully-observed variables
the projection of the EIF onto the fully-observed variables
Estimate nuisance functions for average value-based VIMs
Estimate nuisance functions for average value-based VIMs
estimate_nuisances( fit, X, exposure_name, V = 1, SL.library, sample_splitting, sample_splitting_folds, verbose, weights, cross_fitted_se, split = 1, ... ) estimate_nuisances( fit, X, exposure_name, V = 1, SL.library, sample_splitting, sample_splitting_folds, verbose, weights, cross_fitted_se, split = 1, ... )
estimate_nuisances( fit, X, exposure_name, V = 1, SL.library, sample_splitting, sample_splitting_folds, verbose, weights, cross_fitted_se, split = 1, ... ) estimate_nuisances( fit, X, exposure_name, V = 1, SL.library, sample_splitting, sample_splitting_folds, verbose, weights, cross_fitted_se, split = 1, ... )
fit |
the fitted nuisance function estimator |
X |
the covariates. If |
exposure_name |
(only used if |
V |
the number of folds for cross-fitting, defaults to 5. If
|
SL.library |
a character vector of learners to pass to
|
sample_splitting |
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to |
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
verbose |
should we print progress? defaults to FALSE |
weights |
weights to pass to estimation procedure |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
split |
the sample split to use |
... |
other arguments to the estimation tool, see "See also". |
nuisance function estimators for use in the average value VIM: the treatment assignment based on the estimated optimal rule (based on the estimated outcome regression); the expected outcome under the estimated optimal rule; and the estimated propensity score.
nuisance function estimators for use in the average value VIM: the treatment assignment based on the estimated optimal rule (based on the estimated outcome regression); the expected outcome under the estimated optimal rule; and the estimated propensity score.
Estimate the specified type of predictiveness
estimate_type_predictiveness(arg_lst, type)
estimate_type_predictiveness(arg_lst, type)
arg_lst |
a list of arguments; from, e.g., |
type |
the type of predictiveness, e.g., |
Obtain a Point Estimate and Efficient Influence Function Estimate for a Given Predictiveness Measure
## S3 method for class 'predictiveness_measure' estimate(x, ...)
## S3 method for class 'predictiveness_measure' estimate(x, ...)
x |
an object of class |
... |
other arguments to type-specific predictiveness measures (currently unused) |
A list with the point estimate, naive point estimate (for ANOVA only), estimated EIF, and the predictions for coarsened data EIF (for coarsened data settings only)
Use the cross-validated Super Learner and a set of specified sample-splitting folds to extract cross-fitted predictions on separate splits of the data. This is primarily for use in cases where you have already fit a CV.SuperLearner and want to use the fitted values to compute variable importance without having to re-fit. The number of folds used in the CV.SuperLearner must be even.
extract_sampled_split_predictions( cvsl_obj = NULL, sample_splitting = TRUE, sample_splitting_folds = NULL, full = TRUE, preds = NULL, cross_fitting_folds = NULL, vector = TRUE )
extract_sampled_split_predictions( cvsl_obj = NULL, sample_splitting = TRUE, sample_splitting_folds = NULL, full = TRUE, preds = NULL, cross_fitting_folds = NULL, vector = TRUE )
cvsl_obj |
An object of class |
sample_splitting |
logical; should we use sample-splitting or not?
Defaults to |
sample_splitting_folds |
A vector of folds to use for sample splitting |
full |
logical; is this the fit to all covariates ( |
preds |
a vector of predictions; must be entered unless |
cross_fitting_folds |
a vector of folds that were used in cross-fitting. |
vector |
logical; should we return a vector (where each element is the prediction when the corresponding row is in the validation fold) or a list? |
The predictions on validation data in each split-sample fold.
CV.SuperLearner
for usage of the
CV.SuperLearner
function.
predictiveness_measure
objectNicely formats the output from a predictiveness_measure
object for printing.
## S3 method for class 'predictiveness_measure' format(x, ...)
## S3 method for class 'predictiveness_measure' format(x, ...)
x |
the |
... |
other options, see the generic |
vim
objectNicely formats the output from a vim
object for printing.
## S3 method for class 'vim' format(x, ...)
## S3 method for class 'vim' format(x, ...)
x |
the |
... |
other options, see the generic |
Get a numeric vector with cross-validation fold IDs from CV.SuperLearner
get_cv_sl_folds(cv_sl_folds)
get_cv_sl_folds(cv_sl_folds)
cv_sl_folds |
The folds from a call to |
A numeric vector with the fold IDs.
Obtain the type of VIM to estimate using partial matching
Obtain the type of VIM to estimate using partial matching
get_full_type(type) get_full_type(type)
get_full_type(type) get_full_type(type)
type |
the partial string indicating the type of VIM |
the full string indicating the type of VIM
the full string indicating the type of VIM
Return test-set only data
Return test-set only data
get_test_set(arg_lst, k) get_test_set(arg_lst, k)
get_test_set(arg_lst, k) get_test_set(arg_lst, k)
arg_lst |
a list of estimates, data, etc. |
k |
the index of interest |
the test-set only data
the test-set only data
Create Folds for Cross-Fitting
Create Folds for Cross-Fitting
make_folds(y, V = 2, stratified = FALSE, C = NULL, probs = rep(1/V, V)) make_folds(y, V = 2, stratified = FALSE, C = NULL, probs = rep(1/V, V))
make_folds(y, V = 2, stratified = FALSE, C = NULL, probs = rep(1/V, V)) make_folds(y, V = 2, stratified = FALSE, C = NULL, probs = rep(1/V, V))
y |
the outcome |
V |
the number of folds |
stratified |
should the folds be stratified based on the outcome? |
C |
a vector indicating whether or not the observation is fully observed; 1 denotes yes, 0 denotes no |
probs |
vector of proportions for each fold number |
a vector of folds
a vector of folds
Turn folds from 2K-fold cross-fitting into individual K-fold folds
Turn folds from 2K-fold cross-fitting into individual K-fold folds
make_kfold( cross_fitting_folds, sample_splitting_folds = rep(1, length(unique(cross_fitting_folds))), C = rep(1, length(cross_fitting_folds)) ) make_kfold( cross_fitting_folds, sample_splitting_folds = rep(1, length(unique(cross_fitting_folds))), C = rep(1, length(cross_fitting_folds)) )
make_kfold( cross_fitting_folds, sample_splitting_folds = rep(1, length(unique(cross_fitting_folds))), C = rep(1, length(cross_fitting_folds)) ) make_kfold( cross_fitting_folds, sample_splitting_folds = rep(1, length(unique(cross_fitting_folds))), C = rep(1, length(cross_fitting_folds)) )
cross_fitting_folds |
the vector of cross-fitting folds |
sample_splitting_folds |
the sample splitting folds |
C |
vector of whether or not we measured the observation in phase 2 |
the two sets of testing folds for K-fold cross-fitting
the two sets of testing folds for K-fold cross-fitting
Compute nonparametric estimate of classification accuracy.
measure_accuracy( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
measure_accuracy( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
cutoff |
The risk score cutoff at which the accuracy is evaluated, defaults to 0.5 (for the accuracy of the Bayes classifier). |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated classification accuracy of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Estimate ANOVA decomposition-based variable importance.
measure_anova( full, reduced, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_anova( full, reduced, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
full |
fitted values from a regression function of the observed outcome on the full set of covariates. |
reduced |
fitted values from a regression on the reduced set of observed covariates. |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated ANOVA (based on a one-step correction) of the fitted regression functions; (2) the estimated influence function; (3) the naive ANOVA estimate; and (4) the IPC EIF predictions.
Compute nonparametric estimate of AUC.
measure_auc( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_auc( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated AUC of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of the average value under the optimal treatment rule.
measure_average_value( nuisance_estimators, y, a, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, ... )
measure_average_value( nuisance_estimators, y, a, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, ... )
nuisance_estimators |
a list of nuisance function estimators on the observed data (may be within a specified fold, for cross-fitted estimates). Specifically: an estimator of the optimal treatment rule; an estimator of the propensity score under the estimated optimal treatment rule; and an estimator of the outcome regression when treatment is assigned according to the estimated optimal rule. |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
a |
the observed treatment assignment (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated classification accuracy of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of cross-entropy.
measure_cross_entropy( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_cross_entropy( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated cross-entropy of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of deviance.
measure_deviance( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_deviance( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated deviance of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of mean squared error.
measure_mse( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_mse( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "identity", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated mean squared error of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of NPV.
measure_npv( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
measure_npv( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
cutoff |
The risk score cutoff at which the NPV is evaluated.
Fitted values above |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated NPV of the fitted regression
function using specified cutoff
; (2) the estimated influence function; and
(3) the IPC EIF predictions.
Compute nonparametric estimate of PPV.
measure_ppv( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
measure_ppv( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
cutoff |
The risk score cutoff at which the PPV is evaluated.
Fitted values above |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated PPV of the fitted regression
function using specified cutoff
; (2) the estimated influence function; and
(3) the IPC EIF predictions.
Estimate R-squared
measure_r_squared( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
measure_r_squared( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated R-squared of the fitted regression function; (2) the estimated influence function; and (3) the IPC EIF predictions.
Compute nonparametric estimate of sensitivity.
measure_sensitivity( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
measure_sensitivity( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
cutoff |
The risk score cutoff at which the specificity is evaluated.
Fitted values above |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated sensitivity of the fitted regression
function using specified cutoff
; (2) the estimated influence function; and
(3) the IPC EIF predictions.
Compute nonparametric estimate of specificity.
measure_specificity( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
measure_specificity( fitted_values, y, full_y = NULL, C = rep(1, length(y)), Z = NULL, ipc_weights = rep(1, length(y)), ipc_fit_type = "external", ipc_eif_preds = rep(1, length(y)), ipc_est_type = "aipw", scale = "logit", na.rm = FALSE, nuisance_estimators = NULL, a = NULL, cutoff = 0.5, ... )
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
y |
the observed outcome (may be within a specified fold, for cross-fitted estimates). |
full_y |
the observed outcome (not used, defaults to |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
nuisance_estimators |
not used; for compatibility with |
a |
not used; for compatibility with |
cutoff |
The risk score cutoff at which the specificity is evaluated.
Fitted values above |
... |
other arguments to SuperLearner, if |
A named list of: (1) the estimated specificity of the fitted regression
function using specified cutoff
; (2) the estimated influence function; and
(3) the IPC EIF predictions.
vim
objects into oneTake the output from multiple different calls to vimp_regression
and
merge into a single vim
object; mostly used for plotting results.
merge_vim(...)
merge_vim(...)
... |
an arbitrary number of |
an object of class vim
containing all of the output
from the individual vim
objects. This results in a list containing:
s - a list of the column(s) to calculate variable importance for
SL.library - a list of the libraries of learners passed to SuperLearner
full_fit - a list of the fitted values of the chosen method fit to the full data
red_fit - a list of the fitted values of the chosen method fit to the reduced data
est- a vector with the corrected estimates
naive- a vector with the naive estimates
eif- a list with the influence curve-based updates
se- a vector with the standard errors
ci- a matrix with the CIs
mat - a tibble with the estimated variable importance, the standard errors, and the % confidence intervals
full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)
red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)
alpha - a list of the levels, for confidence interval calculation
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # using Super Learner (with a small number of folds, for illustration only) est_2 <- vimp_regression(Y = y, X = x, indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) est_1 <- vimp_regression(Y = y, X = x, indx = 1, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) ests <- merge_vim(est_1, est_2)
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # using Super Learner (with a small number of folds, for illustration only) est_2 <- vimp_regression(Y = y, X = x, indx = 2, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) est_1 <- vimp_regression(Y = y, X = x, indx = 1, V = 2, run_regression = TRUE, alpha = 0.05, SL.library = learners, cvControl = list(V = 2)) ests <- merge_vim(est_1, est_2)
Construct a Predictiveness Measure
predictiveness_measure( type = character(), y = numeric(), a = numeric(), fitted_values = numeric(), cross_fitting_folds = rep(1, length(fitted_values)), full_y = NULL, nuisance_estimators = list(), C = rep(1, length(y)), Z = NULL, folds_Z = cross_fitting_folds, ipc_weights = rep(1, length(y)), ipc_fit_type = "SL", ipc_eif_preds = numeric(), ipc_est_type = "aipw", scale = "identity", na.rm = TRUE, ... )
predictiveness_measure( type = character(), y = numeric(), a = numeric(), fitted_values = numeric(), cross_fitting_folds = rep(1, length(fitted_values)), full_y = NULL, nuisance_estimators = list(), C = rep(1, length(y)), Z = NULL, folds_Z = cross_fitting_folds, ipc_weights = rep(1, length(y)), ipc_fit_type = "SL", ipc_eif_preds = numeric(), ipc_est_type = "aipw", scale = "identity", na.rm = TRUE, ... )
type |
the measure of interest (e.g., "accuracy", "auc", "r_squared") |
y |
the outcome of interest |
a |
the exposure of interest (only used if |
fitted_values |
fitted values from a regression function using the observed data (may be within a specified fold, for cross-fitted estimates). |
cross_fitting_folds |
folds for cross-fitting, if used to obtain the fitted values. If not used, a vector of ones. |
full_y |
the observed outcome (not used, defaults to |
nuisance_estimators |
a list of nuisance function estimators on the
observed data (may be within a specified fold, for cross-fitted estimates).
For the average value measure: an estimator of the optimal treatment rule ( |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either |
folds_Z |
either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z. |
ipc_weights |
weights for inverse probability of coarsening (IPC) (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted. (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_fit_type |
if "external", then use |
ipc_eif_preds |
if |
ipc_est_type |
IPC correction, either |
scale |
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform). |
na.rm |
logical; should |
... |
other arguments to SuperLearner, if |
An object of class "predictiveness_measure"
, with the following
attributes:
predictiveness_measure
objectsPrints out a table of the point estimate and standard error for a predictiveness_measure
object.
## S3 method for class 'predictiveness_measure' print(x, ...)
## S3 method for class 'predictiveness_measure' print(x, ...)
x |
the |
... |
other options, see the generic |
vim
objectsPrints out the table of estimates, confidence intervals, and standard errors for a vim
object.
## S3 method for class 'vim' print(x, ...)
## S3 method for class 'vim' print(x, ...)
x |
the |
... |
other options, see the generic |
Process argument list for Super Learner estimation of the EIF
Process argument list for Super Learner estimation of the EIF
process_arg_lst(arg_lst) process_arg_lst(arg_lst)
process_arg_lst(arg_lst) process_arg_lst(arg_lst)
arg_lst |
the list of arguments for Super Learner |
a list of modified arguments for EIF estimation
a list of modified arguments for EIF estimation
Run a Super Learner for the provided subset of features
Run a Super Learner for the provided subset of features
run_sl( Y = NULL, X = NULL, V = 5, SL.library = "SL.glm", univariate_SL.library = NULL, s = 1, cv_folds = NULL, sample_splitting = TRUE, ss_folds = NULL, split = 1, verbose = FALSE, progress_bar = NULL, indx = 1, weights = rep(1, nrow(X)), cross_fitted_se = TRUE, full = NULL, vector = TRUE, ... ) run_sl( Y = NULL, X = NULL, V = 5, SL.library = "SL.glm", univariate_SL.library = NULL, s = 1, cv_folds = NULL, sample_splitting = TRUE, ss_folds = NULL, split = 1, verbose = FALSE, progress_bar = NULL, indx = 1, weights = rep(1, nrow(X)), cross_fitted_se = TRUE, full = NULL, vector = TRUE, ... )
run_sl( Y = NULL, X = NULL, V = 5, SL.library = "SL.glm", univariate_SL.library = NULL, s = 1, cv_folds = NULL, sample_splitting = TRUE, ss_folds = NULL, split = 1, verbose = FALSE, progress_bar = NULL, indx = 1, weights = rep(1, nrow(X)), cross_fitted_se = TRUE, full = NULL, vector = TRUE, ... ) run_sl( Y = NULL, X = NULL, V = 5, SL.library = "SL.glm", univariate_SL.library = NULL, s = 1, cv_folds = NULL, sample_splitting = TRUE, ss_folds = NULL, split = 1, verbose = FALSE, progress_bar = NULL, indx = 1, weights = rep(1, nrow(X)), cross_fitted_se = TRUE, full = NULL, vector = TRUE, ... )
Y |
the outcome |
X |
the covariates |
V |
the number of folds |
SL.library |
the library of candidate learners |
univariate_SL.library |
the library of candidate learners for single-covariate regressions |
s |
the subset of interest |
cv_folds |
the CV folds |
sample_splitting |
logical; should we use sample-splitting for predictiveness estimation? |
ss_folds |
the sample-splitting folds; only used if
|
split |
the split to use for sample-splitting; only used if
|
verbose |
should we print progress? defaults to FALSE |
progress_bar |
the progress bar to print to (only if verbose = TRUE) |
indx |
the index to pass to progress bar (only if verbose = TRUE) |
weights |
weights to pass to estimation procedure |
cross_fitted_se |
if |
full |
should this be considered a "full" or "reduced" regression?
If |
vector |
should we return a vector ( |
... |
other arguments to Super Learner |
a list of length V, with the results of predicting on the hold-out data for each v in 1 through V
a list of length V, with the results of predicting on the hold-out data for each v in 1 through V
Creates the Z and W matrices and a list of sampled subsets, S, for SPVIM estimation.
sample_subsets(p, gamma, n)
sample_subsets(p, gamma, n)
p |
the number of covariates |
gamma |
the fraction of the sample size to sample (e.g., |
n |
the sample size |
a list, with elements Z (the matrix encoding presence/absence of each feature in the uniquely sampled subsets), S (the list of unique sampled subsets), W (the matrix of weights), and z_counts (the number of times each subset was sampled)
p <- 10 gamma <- 1 n <- 100 set.seed(100) subset_lst <- sample_subsets(p, gamma, n)
p <- 10 gamma <- 1 n <- 100 set.seed(100) subset_lst <- sample_subsets(p, gamma, n)
Return an estimator on a different scale
Return an estimator on a different scale
scale_est(obs_est = NULL, grad = NULL, scale = "identity") scale_est(obs_est = NULL, grad = NULL, scale = "identity")
scale_est(obs_est = NULL, grad = NULL, scale = "identity") scale_est(obs_est = NULL, grad = NULL, scale = "identity")
obs_est |
the observed VIM estimate |
grad |
the estimated efficient influence function |
scale |
the scale to compute on |
It may be of interest to return an estimate (or confidence interval) on a different scale than originally measured. For example, computing a confidence interval (CI) for a VIM value that lies in (0,1) on the logit scale ensures that the CI also lies in (0, 1).
It may be of interest to return an estimate (or confidence interval) on a different scale than originally measured. For example, computing a confidence interval (CI) for a VIM value that lies in (0,1) on the logit scale ensures that the CI also lies in (0, 1).
the scaled estimate
the scaled estimate
Compute estimates and confidence intervals for the SPVIMs, using cross-fitting.
sp_vim( Y = NULL, X = NULL, V = 5, type = "r_squared", SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), univariate_SL.library = NULL, gamma = 1, alpha = 0.05, delta = 0, na.rm = FALSE, stratified = FALSE, verbose = FALSE, sample_splitting = TRUE, final_point_estimate = "split", C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale = "identity", scale_est = TRUE, cross_fitted_se = TRUE, ... )
sp_vim( Y = NULL, X = NULL, V = 5, type = "r_squared", SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), univariate_SL.library = NULL, gamma = 1, alpha = 0.05, delta = 0, na.rm = FALSE, stratified = FALSE, verbose = FALSE, sample_splitting = TRUE, final_point_estimate = "split", C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale = "identity", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
V |
the number of folds for cross-fitting, defaults to 5. If
|
type |
the type of importance to compute; defaults to
|
SL.library |
a character vector of learners to pass to
|
univariate_SL.library |
(optional) a character vector of learners to
pass to |
gamma |
the fraction of the sample size to use when sampling subsets
(e.g., |
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
verbose |
should |
sample_splitting |
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_scale |
what scale should the inverse probability weight correction be applied on (if any)? Defaults to "identity". (other options are "log" and "logit") |
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the SPVIM as the weighted average of the population
difference in predictiveness over all subsets of features not containing
feature .
This is equivalent to finding the solution to a population weighted least squares problem. This key fact allows us to estimate the SPVIM using weighted least squares, where we first sample subsets from the power set of all possible features using the Shapley sampling distribution; then use cross-fitting to obtain estimators of the predictiveness of each sampled subset; and finally, solve the least squares problem given in Williamson and Feng (2020).
See the paper by Williamson and Feng (2020) for more details on the mathematics behind this function, and the validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list containing:
the library of learners passed to SuperLearner
the estimated predictiveness measure for each sampled subset
the fitted values on the entire dataset from the chosen method for each sampled subset
the cross-fitted predicted values from the chosen method for each sampled subset
the estimated SPVIM value for each feature
the influence functions for each sampled subset
the contibutions to the variance from estimating predictiveness
the contributions to the variance from sampling subsets
a list of the SPVIM influence function contributions
the standard errors for the estimated variable importance
the % confidence intervals based on the variable importance estimates
p-values for the null hypothesis test of zero importance for each variable
the test statistic for each null hypothesis test of zero importance
a hypothesis testing decision for each null hypothesis test (for each variable having zero importance)
the fraction of the sample size used when sampling subsets
the level, for confidence interval calculation
the delta
value used for hypothesis testing
the outcome
the weights
the scale on which CIs were computed
- a tibble with the estimates, SEs, CIs, hypothesis testing decisions, and p-values
An object of class vim
. See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
n <- 100 p <- 2 # generate the data x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- as.matrix(smooth + stats::rnorm(n, 0, 1)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # ----------------------------------------- # using Super Learner (with a small number of CV folds, # for illustration only) # ----------------------------------------- set.seed(4747) est <- sp_vim(Y = y, X = x, V = 2, type = "r_squared", SL.library = learners, alpha = 0.05)
n <- 100 p <- 2 # generate the data x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- as.matrix(smooth + stats::rnorm(n, 0, 1)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # ----------------------------------------- # using Super Learner (with a small number of CV folds, # for illustration only) # ----------------------------------------- set.seed(4747) est <- sp_vim(Y = y, X = x, V = 2, type = "r_squared", SL.library = learners, alpha = 0.05)
Compute the influence functions for the contribution from sampling observations and subsets.
spvim_ics(Z, z_counts, W, v, psi, G, c_n, ics, measure)
spvim_ics(Z, z_counts, W, v, psi, G, c_n, ics, measure)
Z |
the matrix of presence/absence of each feature (columns) in each sampled subset (rows) |
z_counts |
the number of times each unique subset was sampled |
W |
the matrix of weights |
v |
the estimated predictiveness measures |
psi |
the estimated SPVIM values |
G |
the constraint matrix |
c_n |
the constraint values |
ics |
a list of influence function values for each predictiveness measure |
measure |
the type of measure (e.g., "r_squared" or "auc") |
The processes for sampling observations and sampling subsets are independent. Thus, we can compute the influence function separately for each sampling process. For further details, see the paper by Williamson and Feng (2020).
a named list of length 2; contrib_v
is the contribution from estimating V, while contrib_s
is the contribution from sampling subsets.
Compute standard error estimates based on the estimated influence function for a SPVIM value of interest.
spvim_se(ics, idx = 1, gamma = 1, na_rm = FALSE)
spvim_se(ics, idx = 1, gamma = 1, na_rm = FALSE)
ics |
the influence function estimates based on the contributions
from sampling observations and sampling subsets: a list of length two
resulting from a call to |
idx |
the index of interest |
gamma |
the proportion of the sample size used when sampling subsets |
na_rm |
remove |
Since the processes for sampling observations and subsets are independent, the variance for a given SPVIM estimator is simply the sum of the variances based on sampling observations and on sampling subsets.
The standard error estimate for the desired SPVIM value
spvim_ics
for how the influence functions are estimated.
Compute estimates of and confidence intervals for nonparametric intrinsic variable importance based on the population-level contrast between the oracle predictiveness using the feature(s) of interest versus not.
vim( Y = NULL, X = NULL, f1 = NULL, f2 = NULL, indx = 1, type = "r_squared", run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, scale = "identity", na.rm = FALSE, sample_splitting = TRUE, sample_splitting_folds = NULL, final_point_estimate = "split", stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale_est = TRUE, nuisance_estimators_full = NULL, nuisance_estimators_reduced = NULL, exposure_name = NULL, bootstrap = FALSE, b = 1000, boot_interval_type = "perc", clustered = FALSE, cluster_id = rep(NA, length(Y)), ... )
vim( Y = NULL, X = NULL, f1 = NULL, f2 = NULL, indx = 1, type = "r_squared", run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, scale = "identity", na.rm = FALSE, sample_splitting = TRUE, sample_splitting_folds = NULL, final_point_estimate = "split", stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_scale = "identity", ipc_weights = rep(1, length(Y)), ipc_est_type = "aipw", scale_est = TRUE, nuisance_estimators_full = NULL, nuisance_estimators_reduced = NULL, exposure_name = NULL, bootstrap = FALSE, b = 1000, boot_interval_type = "perc", clustered = FALSE, cluster_id = rep(NA, length(Y)), ... )
Y |
the outcome. |
X |
the covariates. If |
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. A vector of the same length as |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
type |
the type of importance to compute; defaults to
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
sample_splitting |
should we use sample-splitting to estimate the full and
reduced predictiveness? Defaults to |
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_scale |
what scale should the inverse probability weight correction be applied on (if any)? Defaults to "identity". (other options are "log" and "logit") |
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
nuisance_estimators_full |
(only used if |
nuisance_estimators_reduced |
(only used if |
exposure_name |
(only used if |
bootstrap |
should bootstrap-based standard error estimates be computed?
Defaults to |
b |
the number of bootstrap replicates (only used if |
boot_interval_type |
the type of bootstrap interval (one of |
clustered |
should the bootstrap resamples be performed on clusters
rather than individual observations? Defaults to |
cluster_id |
vector of the same length as |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution. VIM estimates are obtained by obtaining
estimators
and
of
and
,
respectively; obtaining an estimator
of
; and finally,
setting
.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the type of risk-based variable importance measured
the fitted values of the chosen method fit to the full data
the fitted values of the chosen method fit to the reduced data
the estimated variable importance
the naive estimator of variable importance (only used if type = "anova"
)
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for sample-splitting (used for hypothesis testing)
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of classes vim
and the type of risk-based measure.
See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Bernoulli (smooth) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # using Y and X; use class-balanced folds est_1 <- vim(y, x, indx = 2, type = "accuracy", alpha = 0.05, run_regression = TRUE, SL.library = learners, cvControl = list(V = 2), stratified = TRUE) # using pre-computed fitted values set.seed(4747) V <- 2 full_fit <- SuperLearner::CV.SuperLearner(Y = y, X = x, SL.library = learners, cvControl = list(V = 2), innerCvControl = list(list(V = V))) full_fitted <- SuperLearner::predict.SuperLearner(full_fit)$pred # fit the data with only X1 reduced_fit <- SuperLearner::CV.SuperLearner(Y = full_fitted, X = x[, -2, drop = FALSE], SL.library = learners, cvControl = list(V = 2, validRows = full_fit$folds), innerCvControl = list(list(V = V))) reduced_fitted <- SuperLearner::predict.SuperLearner(reduced_fit)$pred est_2 <- vim(Y = y, f1 = full_fitted, f2 = reduced_fitted, indx = 2, run_regression = FALSE, alpha = 0.05, stratified = TRUE, type = "accuracy", sample_splitting_folds = get_cv_sl_folds(full_fit$folds))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Bernoulli (smooth) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm") # using Y and X; use class-balanced folds est_1 <- vim(y, x, indx = 2, type = "accuracy", alpha = 0.05, run_regression = TRUE, SL.library = learners, cvControl = list(V = 2), stratified = TRUE) # using pre-computed fitted values set.seed(4747) V <- 2 full_fit <- SuperLearner::CV.SuperLearner(Y = y, X = x, SL.library = learners, cvControl = list(V = 2), innerCvControl = list(list(V = V))) full_fitted <- SuperLearner::predict.SuperLearner(full_fit)$pred # fit the data with only X1 reduced_fit <- SuperLearner::CV.SuperLearner(Y = full_fitted, X = x[, -2, drop = FALSE], SL.library = learners, cvControl = list(V = 2, validRows = full_fit$folds), innerCvControl = list(list(V = V))) reduced_fitted <- SuperLearner::predict.SuperLearner(reduced_fit)$pred est_2 <- vim(Y = y, f1 = full_fitted, f2 = reduced_fitted, indx = 2, run_regression = FALSE, alpha = 0.05, stratified = TRUE, type = "accuracy", sample_splitting_folds = get_cv_sl_folds(full_fit$folds))
Compute estimates of and confidence intervals for nonparametric
difference in classification accuracy-based intrinsic variable importance.
This is a wrapper function for cv_vim
, with type = "accuracy"
.
vimp_accuracy( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_accuracy( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. If sample-splitting is requested, then these must be
estimated specially; see Details. If |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution.
Cross-fitted VIM estimates are computed differently if sample-splitting
is requested versus if it is not. We recommend using sample-splitting
in most cases, since only in this case will inferences be valid if
the variable(s) of interest have truly zero population importance.
The purpose of cross-fitting is to estimate and
on independent data from estimating
; this can result in improved
performance, especially when using flexible learning algorithms. The purpose
of sample-splitting is to estimate
and
on independent
data; this allows valid inference under the null hypothesis of zero importance.
Without sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
With sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds. These folds are further divided
into 2 groups of folds. Then, for each fold
in the first group,
estimator
of
is constructed using all data besides
the kth fold in the group (i.e.,
of the data) and
estimator
of
is constructed using the held-out data
(i.e.,
of the data); then, computing
Similarly, for each fold in the second group,
estimator
of
is constructed using all data
besides the kth fold in the group (i.e.,
of the data)
and estimator
of
is constructed using the held-out
data (i.e.,
of the data); then, computing
Finally,
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind the cv_vim
function, and the
validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the fitted values of the chosen method fit to the full data (a list, for train and test data)
the fitted values of the chosen method fit to the reduced data (a list, for train and test data)
the estimated variable importance
the naive estimator of variable importance
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for hypothesis testing
the folds used for cross-fitting
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of classes vim
and vim_accuracy
.
See Details for more information.
SuperLearner
for specific usage of the SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_accuracy(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_accuracy(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
Compute estimates of and confidence intervals for nonparametric ANOVA-based
intrinsic variable importance. This is a wrapper function for cv_vim
,
with type = "anova"
. This type
has limited functionality compared to other
types; in particular, null hypothesis tests
are not possible using type = "anova"
.
If you want to do null hypothesis testing
on an equivalent population parameter, use
vimp_rsquared
instead.
vimp_anova( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, cross_fitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_anova( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, cross_fitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population ANOVA
parameter for the group of features (or single feature) by
where is the population conditional mean using all features,
is the population conditional mean using the features with
index not in
, and
and
denote expectation and
variance under the true data-generating distribution, respectively.
Cross-fitted ANOVA estimates are computed by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
where is the empirical variance.
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind this function.
An object of classes vim
and vim_anova
.
See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_anova(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_anova(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
Compute estimates of and confidence intervals for nonparametric difference
in $AUC$-based intrinsic variable importance. This is a wrapper function for
cv_vim
, with type = "auc"
.
vimp_auc( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_auc( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. If sample-splitting is requested, then these must be
estimated specially; see Details. If |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution.
Cross-fitted VIM estimates are computed differently if sample-splitting
is requested versus if it is not. We recommend using sample-splitting
in most cases, since only in this case will inferences be valid if
the variable(s) of interest have truly zero population importance.
The purpose of cross-fitting is to estimate and
on independent data from estimating
; this can result in improved
performance, especially when using flexible learning algorithms. The purpose
of sample-splitting is to estimate
and
on independent
data; this allows valid inference under the null hypothesis of zero importance.
Without sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
With sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds. These folds are further divided
into 2 groups of folds. Then, for each fold
in the first group,
estimator
of
is constructed using all data besides
the kth fold in the group (i.e.,
of the data) and
estimator
of
is constructed using the held-out data
(i.e.,
of the data); then, computing
Similarly, for each fold in the second group,
estimator
of
is constructed using all data
besides the kth fold in the group (i.e.,
of the data)
and estimator
of
is constructed using the held-out
data (i.e.,
of the data); then, computing
Finally,
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind the cv_vim
function, and the
validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the fitted values of the chosen method fit to the full data (a list, for train and test data)
the fitted values of the chosen method fit to the reduced data (a list, for train and test data)
the estimated variable importance
the naive estimator of variable importance
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for hypothesis testing
the folds used for cross-fitting
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of classes vim
and vim_auc
.
See Details for more information.
SuperLearner
for specific usage of the SuperLearner
function and package, and performance
for specific usage of the ROCR
package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_auc(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_auc(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
Compute confidence intervals for the true variable importance parameter.
vimp_ci(est, se, scale = "identity", level = 0.95, truncate = TRUE)
vimp_ci(est, se, scale = "identity", level = 0.95, truncate = TRUE)
est |
estimate of variable importance, e.g., from a call to |
se |
estimate of the standard error of |
scale |
scale to compute interval estimate on (defaults to "identity": compute Wald-type CI). |
level |
confidence interval type (defaults to 0.95). |
truncate |
truncate CIs to have lower limit at (or above) zero? |
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest.
The Wald-based confidence interval for the true importance of the given group of left-out covariates.
Compute estimates of and confidence intervals for nonparametric
deviance-based intrinsic variable importance. This is a wrapper function for
cv_vim
, with type = "deviance"
.
vimp_deviance( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_deviance( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = TRUE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. If sample-splitting is requested, then these must be
estimated specially; see Details. If |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution.
Cross-fitted VIM estimates are computed differently if sample-splitting
is requested versus if it is not. We recommend using sample-splitting
in most cases, since only in this case will inferences be valid if
the variable(s) of interest have truly zero population importance.
The purpose of cross-fitting is to estimate and
on independent data from estimating
; this can result in improved
performance, especially when using flexible learning algorithms. The purpose
of sample-splitting is to estimate
and
on independent
data; this allows valid inference under the null hypothesis of zero importance.
Without sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
With sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds. These folds are further divided
into 2 groups of folds. Then, for each fold
in the first group,
estimator
of
is constructed using all data besides
the kth fold in the group (i.e.,
of the data) and
estimator
of
is constructed using the held-out data
(i.e.,
of the data); then, computing
Similarly, for each fold in the second group,
estimator
of
is constructed using all data
besides the kth fold in the group (i.e.,
of the data)
and estimator
of
is constructed using the held-out
data (i.e.,
of the data); then, computing
Finally,
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind the cv_vim
function, and the
validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the fitted values of the chosen method fit to the full data (a list, for train and test data)
the fitted values of the chosen method fit to the reduced data (a list, for train and test data)
the estimated variable importance
the naive estimator of variable importance
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for hypothesis testing
the folds used for cross-fitting
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of classes vim
and vim_deviance
.
See Details for more information.
SuperLearner
for specific usage of the SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(stats::rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_deviance(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -1, 1))) # apply the function to the x's f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2] smooth <- apply(x, 1, function(z) f(z)) # generate Y ~ Normal (smooth, 1) y <- matrix(stats::rbinom(n, size = 1, prob = smooth)) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_deviance(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
importancePerform a hypothesis test against the null hypothesis of zero importance by:
(i) for a user-specified level , compute a
% confidence interval around the predictiveness for both the full and reduced regression functions (these must be estimated on independent splits of the data);
(ii) if the intervals do not overlap, reject the null hypothesis.
vimp_hypothesis_test( predictiveness_full, predictiveness_reduced, se, delta = 0, alpha = 0.05 )
vimp_hypothesis_test( predictiveness_full, predictiveness_reduced, se, delta = 0, alpha = 0.05 )
predictiveness_full |
the estimated predictiveness of the regression including the covariate(s) of interest. |
predictiveness_reduced |
the estimated predictiveness of the regression excluding the covariate(s) of interest. |
se |
the estimated standard error of the variable importance estimator |
delta |
the value of the |
alpha |
the desired type I error rate (defaults to 0.05). |
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest.
a list, with: the hypothesis testing decision (TRUE
if the null hypothesis is rejected, FALSE
otherwise); the p-value from the hypothesis test; and the test statistic from the hypothesis test.
Compute estimates of and confidence intervals for nonparametric
ANOVA-based intrinsic variable importance. This is a wrapper function for
cv_vim
, with type = "anova"
.
This function is deprecated in vimp
version 2.0.0.
vimp_regression( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, cross_fitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "identity", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_regression( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, cross_fitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "identity", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population ANOVA
parameter for the group of features (or single feature) by
where is the population conditional mean using all features,
is the population conditional mean using the features with
index not in
, and
and
denote expectation and
variance under the true data-generating distribution, respectively.
Cross-fitted ANOVA estimates are computed by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
where is the empirical variance.
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind this function.
An object of classes vim
and vim_regression
.
See Details for more information.
SuperLearner
for specific usage of the SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_regression(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_regression(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
Compute estimates of and confidence intervals for nonparametric $R^2$-based
intrinsic variable importance. This is a wrapper function for cv_vim
,
with type = "r_squared"
.
vimp_rsquared( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
vimp_rsquared( Y = NULL, X = NULL, cross_fitted_f1 = NULL, cross_fitted_f2 = NULL, f1 = NULL, f2 = NULL, indx = 1, V = 10, run_regression = TRUE, SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"), alpha = 0.05, delta = 0, na.rm = FALSE, final_point_estimate = "split", cross_fitting_folds = NULL, sample_splitting_folds = NULL, stratified = FALSE, C = rep(1, length(Y)), Z = NULL, ipc_weights = rep(1, length(Y)), scale = "logit", ipc_est_type = "aipw", scale_est = TRUE, cross_fitted_se = TRUE, ... )
Y |
the outcome. |
X |
the covariates. If |
cross_fitted_f1 |
the predicted values on validation data from a
flexible estimation technique regressing Y on X in the training data. Provided as
either (a) a vector, where each element is
the predicted value when that observation is part of the validation fold;
or (b) a list of length V, where each element in the list is a set of predictions on the
corresponding validation data fold.
If sample-splitting is requested, then these must be estimated specially; see Details. However,
the resulting vector should be the same length as |
cross_fitted_f2 |
the predicted values on validation data from a
flexible estimation technique regressing either (a) the fitted values in
|
f1 |
the fitted values from a flexible estimation technique
regressing Y on X. If sample-splitting is requested, then these must be
estimated specially; see Details. If |
f2 |
the fitted values from a flexible estimation technique
regressing either (a) |
indx |
the indices of the covariate(s) to calculate variable importance for; defaults to 1. |
V |
the number of folds for cross-fitting, defaults to 5. If
|
run_regression |
if outcome Y and covariates X are passed to
|
SL.library |
a character vector of learners to pass to
|
alpha |
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval. |
delta |
the value of the |
na.rm |
should we remove NAs in the outcome and fitted values
in computation? (defaults to |
final_point_estimate |
if sample splitting is used, should the final point estimates
be based on only the sample-split folds used for inference ( |
cross_fitting_folds |
the folds for cross-fitting. Only used if
|
sample_splitting_folds |
the folds used for sample-splitting;
these identify the observations that should be used to evaluate
predictiveness based on the full and reduced sets of covariates, respectively.
Only used if |
stratified |
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds) |
C |
the indicator of coarsening (1 denotes observed, 0 denotes unobserved). |
Z |
either (i) NULL (the default, in which case the argument
|
ipc_weights |
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]). |
scale |
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit") |
ipc_est_type |
the type of procedure used for coarsened-at-random
settings; options are "ipw" (for inverse probability weighting) or
"aipw" (for augmented inverse probability weighting).
Only used if |
scale_est |
should the point estimate be scaled to be greater than or equal to 0?
Defaults to |
cross_fitted_se |
should we use cross-fitting to estimate the standard
errors ( |
... |
other arguments to the estimation tool, see "See also". |
We define the population variable importance measure (VIM) for the
group of features (or single feature) with respect to the
predictiveness measure
by
where is
the population predictiveness maximizing function,
is the
population predictiveness maximizing function that is only allowed to access
the features with index not in
, and
is the true
data-generating distribution.
Cross-fitted VIM estimates are computed differently if sample-splitting
is requested versus if it is not. We recommend using sample-splitting
in most cases, since only in this case will inferences be valid if
the variable(s) of interest have truly zero population importance.
The purpose of cross-fitting is to estimate and
on independent data from estimating
; this can result in improved
performance, especially when using flexible learning algorithms. The purpose
of sample-splitting is to estimate
and
on independent
data; this allows valid inference under the null hypothesis of zero importance.
Without sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds; then using each fold in turn as a
hold-out set, constructing estimators
and
of
and
, respectively on the training data and estimator
of
using the test data; and finally, computing
With sample-splitting, cross-fitted VIM estimates are obtained by first
splitting the data into folds. These folds are further divided
into 2 groups of folds. Then, for each fold
in the first group,
estimator
of
is constructed using all data besides
the kth fold in the group (i.e.,
of the data) and
estimator
of
is constructed using the held-out data
(i.e.,
of the data); then, computing
Similarly, for each fold in the second group,
estimator
of
is constructed using all data
besides the kth fold in the group (i.e.,
of the data)
and estimator
of
is constructed using the held-out
data (i.e.,
of the data); then, computing
Finally,
See the paper by Williamson, Gilbert, Simon, and Carone for more
details on the mathematics behind the cv_vim
function, and the
validity of the confidence intervals.
In the interest of transparency, we return most of the calculations
within the vim
object. This results in a list including:
the column(s) to calculate variable importance for
the library of learners passed to SuperLearner
the fitted values of the chosen method fit to the full data (a list, for train and test data)
the fitted values of the chosen method fit to the reduced data (a list, for train and test data)
the estimated variable importance
the naive estimator of variable importance
the estimated efficient influence function
the estimated efficient influence function for the full regression
the estimated efficient influence function for the reduced regression
the standard error for the estimated variable importance
the % confidence interval for the variable importance estimate
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test
a p-value based on the same test as test
the object returned by the estimation procedure for the full data regression (if applicable)
the object returned by the estimation procedure for the reduced data regression (if applicable)
the level, for confidence interval calculation
the folds used for hypothesis testing
the folds used for cross-fitting
the outcome
the weights
the cluster IDs
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value
An object of classes vim
and vim_rsquared
.
See Details for more information.
SuperLearner
for specific usage of the
SuperLearner
function and package.
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_rsquared(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
# generate the data # generate X p <- 2 n <- 100 x <- data.frame(replicate(p, stats::runif(n, -5, 5))) # apply the function to the x's smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2 # generate Y ~ Normal (smooth, 1) y <- smooth + stats::rnorm(n, 0, 1) # set up a library for SuperLearner; note simple library for speed library("SuperLearner") learners <- c("SL.glm", "SL.mean") # estimate (with a small number of folds, for illustration only) est <- vimp_rsquared(y, x, indx = 2, alpha = 0.05, run_regression = TRUE, SL.library = learners, V = 2, cvControl = list(V = 2))
Compute standard error estimates for estimates of variable importance.
vimp_se( eif_full, eif_reduced, cross_fit = TRUE, sample_split = TRUE, na.rm = FALSE )
vimp_se( eif_full, eif_reduced, cross_fit = TRUE, sample_split = TRUE, na.rm = FALSE )
eif_full |
the estimated efficient influence function (EIF) based on the full set of covariates. |
eif_reduced |
the estimated EIF based on the reduced set of covariates. |
cross_fit |
logical; was cross-fitting used to compute the EIFs?
(defaults to |
sample_split |
logical; was sample-splitting used? (defaults to |
na.rm |
logical; should NA's be removed in computation?
(defaults to |
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest.
The standard error for the estimated variable importance for the given group of left-out covariates.
A dataset containing neutralization sensitivity – measured using inhibitory concentration, the quantity of antibody necessary to neutralize a fraction of viruses in a given sample – and viral features including: amino acid sequence features (measured using HXB2 coordinates), geographic region of origin, subtype, and viral geometry. Accessed from the Los Alamos National Laboratory's (LANL's) Compile, Analyze, and tally Neutralizing Antibody Panels (CATNAP) database.
data("vrc01")
data("vrc01")
A data frame with 611 rows and 837variables:
Viral sequence identifiers
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the viral subtype as 0/1. Possible subtypes are 01_AE, 02_AG, 07_BC, A1, A1C, A1D, B, C, D, O, Other.
Dummy variables encoding the geographic region of origin as 0/1. Regions are Asia, Europe/Americas, North Africa, and Southern Africa.
Dummy variables encoding the geographic region of origin as 0/1. Regions are Asia, Europe/Americas, North Africa, and Southern Africa.
Dummy variables encoding the geographic region of origin as 0/1. Regions are Asia, Europe/Americas, North Africa, and Southern Africa.
Dummy variables encoding the geographic region of origin as 0/1. Regions are Asia, Europe/Americas, North Africa, and Southern Africa.
A binary indicator of whether or not the IC-50 (the concentration at which 50 Right-censoring is a proxy for a resistant virus.
A binary indicator of whether or not the IC-80 (the concentration at which 80 Right-censoring is a proxy for a resistant virus.
Continuous IC-50. If neutralization sensitivity for the virus was assessed in multiple studies, the geometric mean was taken.
Continuous IC-90. If neutralization sensitivity for the virus was assessed in multiple studies, the geometric mean was taken.
Amino acid sequence features denoting the presence (1) or absence (0)
of a residue at the given HXB2-referenced site. For example, hxb2.46.E.1mer
records the presence of an E at HXB2-referenced site 46.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
Amino acid sequence feature denoting the presence (1) or absence (0) of a residue at the given HXB2-referenced site.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The total number of sequons in various areas of the HIV viral envelope protein.
The number of cysteines in various areas of the HIV viral envelope protein.
The number of cysteines in various areas of the HIV viral envelope protein.
The number of cysteines in various areas of the HIV viral envelope protein.
The number of cysteines in various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The length of various areas of the HIV viral envelope protein.
The steric bulk of residues at critical locations.
The steric bulk of residues at critical locations.
The steric bulk of residues at critical locations.
https://github.com/benkeser/vrc01/blob/master/data/fulldata.csv