opendataval.dataval.volume package#

Submodules#

opendataval.dataval.volume.rvs module#

class opendataval.dataval.volume.rvs.RobustVolumeShapley(*args, **kwargs)#

Bases: DataEvaluator, ModelLessMixin

Robust Volume Shapley and Volume Shapley data valuation implementation.

While the following DataEvaluator uses the same TMC-Shapley algorithm used by semivalue evaluators, the following implementation defaults to the non-GR statistic implementation. Instead a fixed number of samples is taken, which is closer to the original implementation here: https://github.com/ZhaoxuanWu/VolumeBased-DataValuation/tree/main

References#

Parameters#

samplerSampler, optional

Sampler used to compute the marginal contributions. Can be found in sampler, by default uses *args, **kwargs for GrTMCSampler.

robustbool, optional

If the robust volume measure will be used which trades off a “more refined representation of diversity for greater robustness to replication”, by default True

omegaOptional[float], optional

Width/discretization coefficient for x_train to be split into a set of d-cubes, required if robust is True, by default 0.05

Mixins#

ModelLessMixin

Mixin for a data evaluator that doesn’t require a model or evaluation metric.

evaluate_data_values() ndarray#

Return data values for each training data point.

Returns#

np.ndarray

Predicted data values/selection for training input data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for volume-based evaluators.

Parameters#

x_traintorch.Tensor

Data covariates

y_traintorch.Tensor

Data labels

x_validtorch.Tensor

Test+Held-out covariates, unused parameter

y_validtorch.Tensor

Test+Held-out labels, unused parameter

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

Returns#

selfobject

Returns a trained Data Evaluator.

opendataval.dataval.volume.rvs.compute_robust_volumes(x_tilde: Tensor, hypercubes: dict[tuple, int])#
opendataval.dataval.volume.rvs.compute_x_tilde_and_counts(x: Tensor, omega: float)#

Compresses the original feature matrix x to x_tilde with the specified omega.

Returns#

np.ndarray

Compressed form of x as a d-cube

dict[tuple, int]

A dictionary of cubes with the respective counts in each dcube

Module contents#