opendataval.dataval.influence package#

Submodules#

opendataval.dataval.influence.influence module#

class opendataval.dataval.influence.influence.InfluenceFunction(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Influence Function Data evaluation implementation.

TODO it may be useful to compute gradients of the validation dataset in batches to save time/space. TODO H^{-1} implementation, Current implementation is for first-order gradients

References#

Parameters#

grad_argstuple, optional

Positional arguments passed to the model.grad function

grad_kwargsdict[str, Any], optional

Key word arguments passed to the model.grad function

evaluate_data_values() ndarray#

Return influence (data values) for each training data point.

Returns#

np.ndarray

Predicted data values for training input data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Influence Function Data Valuation.

Parameters#

x_traintorch.Tensor

Data covariates

y_traintorch.Tensor

Data labels

x_validtorch.Tensor

Test+Held-out covariates

y_validtorch.Tensor

Test+Held-out labels

input_model(pred_model: GradientModel)#

Input the prediction model with gradient.

Parameters#

pred_modelGradientModel

Prediction model with a gradient

train_data_values(*args, **kwargs)#

Trains model to compute influence of each data point (data values).

References#

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

opendataval.dataval.influence.infsub module#

class opendataval.dataval.influence.infsub.InfluenceSubsample(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Influence computed through subsamples implementation.

Compute influence of each training example on for the validation dataset through closely-related subsampled influence.

References#

Parameters#

samplesint, optional

Number of models to fit to take to find data values, by default 1000

proportionfloat, optional

Proportion of data points to be in each sample, cardinality of each subset is \((p)(num_points)\), by default 0.7 as specified by V. Feldman and C. Zhang

random_stateRandomState, optional

Random initial state, by default None

evaluate_data_values() ndarray#

Return data values for each training data point.

Compute data values using the Influence Subsample data valuator. Finds the difference of average performance of all sets including data point minus not-including.

Returns#

np.ndarray

Predicted data values/selection for every training data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Influence Subsample Data Valuation.

Parameters#

x_traintorch.Tensor

Data covariates

y_traintorch.Tensor

Data labels

x_validtorch.Tensor

Test+Held-out covariates

y_validtorch.Tensor

Test+Held-out labels

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Trains the Influence Subsample Data Valuator by sampling from subsets of \((p)(num_points)\) cardinality and computing the performance with the \(i\) data point and without the \(i\) data point. The form of sampling is similar to the shapely value when \(p\) is :math:`0.5: (V. Feldman). Likewise, if we sample not from the subsets of a specific cardinality but the uniform across all subsets, it is similar to the Banzhaf value.

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

Module contents#

NOTE InfluenceSubsample was previously named InfluenceFunctionEval and may be referred to as such in the demos. This docstring is here to clarify the confusion between the naming of InfluenceFunction and InfluenceSubsample.