opendataval.dataval.influence package#

Submodules#

opendataval.dataval.influence.influence module#

class opendataval.dataval.influence.influence.InfluenceFunction(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Influence Function Data evaluation implementation.

TODO it may be useful to compute gradients of the validation dataset in batches to save time/space. TODO H^{-1} implementation, Current implementation is for first-order gradients

References#

Parameters#

grad_argstuple, optional: Positional arguments passed to the model.grad function
grad_kwargsdict[str, Any], optional: Key word arguments passed to the model.grad function

evaluate_data_values() → ndarray#

Return influence (data values) for each training data point.

Returns#

np.ndarray: Predicted data values for training input data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Influence Function Data Valuation.

Parameters#

x_traintorch.Tensor: Data covariates
y_traintorch.Tensor: Data labels
x_validtorch.Tensor: Test+Held-out covariates
y_validtorch.Tensor: Test+Held-out labels

input_model(pred_model: GradientModel)#

Input the prediction model with gradient.

Parameters#

pred_modelGradientModel: Prediction model with a gradient

train_data_values(*args, **kwargs)#

Trains model to compute influence of each data point (data values).

References#

Parameters#

argstuple[Any], optional: Training positional args
kwargsdict[str, Any], optional: Training key word arguments

opendataval.dataval.influence.infsub module#

class opendataval.dataval.influence.infsub.InfluenceSubsample(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Influence computed through subsamples implementation.

Compute influence of each training example on for the validation dataset through closely-related subsampled influence.

References#

Parameters#

samplesint, optional: Number of models to fit to take to find data values, by default 1000
proportionfloat, optional: Proportion of data points to be in each sample, cardinality of each subset is \((p)(num_points)\), by default 0.7 as specified by V. Feldman and C. Zhang
random_stateRandomState, optional: Random initial state, by default None

evaluate_data_values() → ndarray#

Return data values for each training data point.

Compute data values using the Influence Subsample data valuator. Finds the difference of average performance of all sets including data point minus not-including.

Returns#

np.ndarray: Predicted data values/selection for every training data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Influence Subsample Data Valuation.

Parameters#

x_traintorch.Tensor: Data covariates
y_traintorch.Tensor: Data labels
x_validtorch.Tensor: Test+Held-out covariates
y_validtorch.Tensor: Test+Held-out labels

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Trains the Influence Subsample Data Valuator by sampling from subsets of \((p)(num_points)\) cardinality and computing the performance with the \(i\) data point and without the \(i\) data point. The form of sampling is similar to the shapely value when \(p\) is :math:`0.5: (V. Feldman). Likewise, if we sample not from the subsets of a specific cardinality but the uniform across all subsets, it is similar to the Banzhaf value.

Parameters#

argstuple[Any], optional: Training positional args
kwargsdict[str, Any], optional: Training key word arguments

Module contents#

NOTE InfluenceSubsample was previously named InfluenceFunctionEval and may be referred to as such in the demos. This docstring is here to clarify the confusion between the naming of InfluenceFunction and InfluenceSubsample.