opendataval.dataval.RobustVolumeShapley#

class opendataval.dataval.RobustVolumeShapley(*args, **kwargs)#

Robust Volume Shapley and Volume Shapley data valuation implementation.

While the following DataEvaluator uses the same TMC-Shapley algorithm used by semivalue evaluators, the following implementation defaults to the non-GR statistic implementation. Instead a fixed number of samples is taken, which is closer to the original implementation here: https://github.com/ZhaoxuanWu/VolumeBased-DataValuation/tree/main

References#

Parameters#

samplerSampler, optional

Sampler used to compute the marginal contributions. Can be found in sampler, by default uses *args, **kwargs for GrTMCSampler.

robustbool, optional

If the robust volume measure will be used which trades off a “more refined representation of diversity for greater robustness to replication”, by default True

omegaOptional[float], optional

Width/discretization coefficient for x_train to be split into a set of d-cubes, required if robust is True, by default 0.05

Mixins#

ModelLessMixin

Mixin for a data evaluator that doesn’t require a model or evaluation metric.

__init__(sampler: Sampler | None = None, robust: bool = True, omega: float | None = None, *args, **kwargs)#

Methods

__init__([sampler, robust, omega])

embeddings(*tensors)

Returns Embeddings for the input tensors

evaluate_data_values()

Return data values for each training data point.

input_data(x_train, y_train, x_valid, y_valid)

Store and transform input data for volume-based evaluators.

input_fetcher(fetcher)

Input data from a DataFetcher object.

setup(fetcher[, pred_model, metric])

Inputs model, metric and data into Data Evaluator.

train(fetcher[, pred_model, metric])

Store and transform data, then train model to predict data values.

train_data_values(*args, **kwargs)

Trains model to predict data values.

Attributes

Evaluators

data_values

Cached data values.