opendataval.dataval.ame package#

Submodules#

opendataval.dataval.ame.ame module#

class opendataval.dataval.ame.ame.AME(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Implementation of Average Marginal Effect Data Valuation.

References#

Parameters#

num_modelsint, optional

Number of models to bag/aggregate, by default 1000

random_stateRandomState, optional

Random initial state, by default None

evaluate_data_values() ndarray#

Return data values for each training data point.

Compute data values using the coefficients of the Lasso regression according to Lin et al.

Returns#

np.ndarray

Predicted data values/selection for every training data point

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Trains the AME model by fitting bagging models on different proportions and aggregating the subsets and the performance metrics

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

class opendataval.dataval.ame.ame.BaggingEvaluator(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Bagging Data Evaluator, samples data points from \(Bernouli(proportion)\).

References#

Parameters#

num_modelsint, optional

Number of models to bag/aggregate, by default 1000

proportionfloat, optional

Proportion for bernoulli which data points are sampled, by default 1.0

random_stateRandomState, optional

Random initial state, by default None

evaluate_data_values()#

Return data values for each training data point.

Compute data values using the coefficients of the Lasso regression, as used by Lin et al. for the AME evaluator.

Returns#

np.ndarray

Predicted data values/selection for every training data point

get_subset_perf()#

Return the subsets and performance, used by AME DataEvaluator.

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Bagging Evaluator.

Parameters#

x_traintorch.Tensor

Data covariates

y_traintorch.Tensor

Data labels

x_validtorch.Tensor

Test+Held-out covariates

y_validtorch.Tensor

Test+Held-out labels

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Trains the Bagging model to get subsets and corresponding evaluations of the performance of those subsets to compute the data values

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

Module contents#