opendataval.dataval.RobustVolumeShapley#
- class opendataval.dataval.RobustVolumeShapley(*args, **kwargs)#
Robust Volume Shapley and Volume Shapley data valuation implementation.
While the following DataEvaluator uses the same TMC-Shapley algorithm used by semivalue evaluators, the following implementation defaults to the non-GR statistic implementation. Instead a fixed number of samples is taken, which is closer to the original implementation here: https://github.com/ZhaoxuanWu/VolumeBased-DataValuation/tree/main
References#
Parameters#
- samplerSampler, optional
Sampler used to compute the marginal contributions. Can be found in
sampler
, by default uses *args, **kwargs forGrTMCSampler
.- robustbool, optional
If the robust volume measure will be used which trades off a “more refined representation of diversity for greater robustness to replication”, by default True
- omegaOptional[float], optional
Width/discretization coefficient for x_train to be split into a set of d-cubes, required if robust is True, by default 0.05
Mixins#
- ModelLessMixin
Mixin for a data evaluator that doesn’t require a model or evaluation metric.
- __init__(sampler: Sampler | None = None, robust: bool = True, omega: float | None = None, *args, **kwargs)#
Methods
__init__
([sampler, robust, omega])embeddings
(*tensors)Returns Embeddings for the input tensors
evaluate_data_values
()Return data values for each training data point.
input_data
(x_train, y_train, x_valid, y_valid)Store and transform input data for volume-based evaluators.
input_fetcher
(fetcher)Input data from a DataFetcher object.
setup
(fetcher[, pred_model, metric])Inputs model, metric and data into Data Evaluator.
train
(fetcher[, pred_model, metric])Store and transform data, then train model to predict data values.
train_data_values
(*args, **kwargs)Trains model to predict data values.
Attributes
Evaluators
data_values
Cached data values.