opendataval.dataval.volume package#
Submodules#
opendataval.dataval.volume.rvs module#
- class opendataval.dataval.volume.rvs.RobustVolumeShapley(*args, **kwargs)#
Bases:
DataEvaluator
,ModelLessMixin
Robust Volume Shapley and Volume Shapley data valuation implementation.
While the following DataEvaluator uses the same TMC-Shapley algorithm used by semivalue evaluators, the following implementation defaults to the non-GR statistic implementation. Instead a fixed number of samples is taken, which is closer to the original implementation here: https://github.com/ZhaoxuanWu/VolumeBased-DataValuation/tree/main
References#
Parameters#
- samplerSampler, optional
Sampler used to compute the marginal contributions. Can be found in
sampler
, by default uses *args, **kwargs forGrTMCSampler
.- robustbool, optional
If the robust volume measure will be used which trades off a “more refined representation of diversity for greater robustness to replication”, by default True
- omegaOptional[float], optional
Width/discretization coefficient for x_train to be split into a set of d-cubes, required if robust is True, by default 0.05
Mixins#
- ModelLessMixin
Mixin for a data evaluator that doesn’t require a model or evaluation metric.
- evaluate_data_values() ndarray #
Return data values for each training data point.
Returns#
- np.ndarray
Predicted data values/selection for training input data point
- input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#
Store and transform input data for volume-based evaluators.
Parameters#
- x_traintorch.Tensor
Data covariates
- y_traintorch.Tensor
Data labels
- x_validtorch.Tensor
Test+Held-out covariates, unused parameter
- y_validtorch.Tensor
Test+Held-out labels, unused parameter
- opendataval.dataval.volume.rvs.compute_robust_volumes(x_tilde: Tensor, hypercubes: dict[tuple, int])#