opendataval.dataval.ame package#
Submodules#
opendataval.dataval.ame.ame module#
- class opendataval.dataval.ame.ame.AME(*args, **kwargs)#
Bases:
DataEvaluator
,ModelMixin
Implementation of Average Marginal Effect Data Valuation.
References#
Parameters#
- num_modelsint, optional
Number of models to bag/aggregate, by default 1000
- random_stateRandomState, optional
Random initial state, by default None
- evaluate_data_values() ndarray #
Return data values for each training data point.
Compute data values using the coefficients of the Lasso regression according to Lin et al.
Returns#
- np.ndarray
Predicted data values/selection for every training data point
- train_data_values(*args, **kwargs)#
Trains model to predict data values.
Trains the AME model by fitting bagging models on different proportions and aggregating the subsets and the performance metrics
Parameters#
- argstuple[Any], optional
Training positional args
- kwargsdict[str, Any], optional
Training key word arguments
- class opendataval.dataval.ame.ame.BaggingEvaluator(*args, **kwargs)#
Bases:
DataEvaluator
,ModelMixin
Bagging Data Evaluator, samples data points from \(Bernouli(proportion)\).
References#
[1] J. Lin, A. Zhang, M. Lecuyer, J. Li, A. Panda, and S. Sen, Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments, arXiv.org, 2022. Available: https://arxiv.org/abs/2206.10013.
Parameters#
- num_modelsint, optional
Number of models to bag/aggregate, by default 1000
- proportionfloat, optional
Proportion for bernoulli which data points are sampled, by default 1.0
- random_stateRandomState, optional
Random initial state, by default None
- evaluate_data_values()#
Return data values for each training data point.
Compute data values using the coefficients of the Lasso regression, as used by Lin et al. for the AME evaluator.
Returns#
- np.ndarray
Predicted data values/selection for every training data point
- get_subset_perf()#
Return the subsets and performance, used by AME DataEvaluator.
- input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#
Store and transform input data for Bagging Evaluator.
Parameters#
- x_traintorch.Tensor
Data covariates
- y_traintorch.Tensor
Data labels
- x_validtorch.Tensor
Test+Held-out covariates
- y_validtorch.Tensor
Test+Held-out labels
- train_data_values(*args, **kwargs)#
Trains model to predict data values.
Trains the Bagging model to get subsets and corresponding evaluations of the performance of those subsets to compute the data values
Parameters#
- argstuple[Any], optional
Training positional args
- kwargsdict[str, Any], optional
Training key word arguments