class opendataval.dataval.ShapEvaluator(*args, **kwargs)#

Abstract class for all semivalue-based methods of computing data values.



samplerSampler, optional

Sampler used to compute the marginal contribution, by default uses TMC-Shapley with a Gelman-Rubin statistic terminator. Samplers are found in sampler


samplerSampler, optional

Sampler used to compute the marginal contributions. Can be found in opendataval/margcontrib/, by default GrTMCSampler and uses additonal arguments as constructor for sampler.

gr_thresholdfloat, optional

Convergence threshold for the Gelman-Rubin statistic. Shapley values are NP-hard so we resort to MCMC sampling, by default 1.05

max_mc_epochsint, optional

Max number of outer epochs of MCMC sampling, by default 100

models_per_epochint, optional

Number of model fittings to take per epoch prior to checking GR convergence, by default 100

min_modelsint, optional

Minimum samples before checking MCMC convergence, by default 1000

min_cardinalityint, optional

Minimum cardinality of a training set, must be passed as kwarg, by default 5

cache_namestr, optional

Unique cache_name of the model to cache marginal contributions, set to None to disable caching, by default “” which is set to a unique value for a object

random_stateRandomState, optional

Random initial state, by default None

__init__(sampler: Sampler | None = None, *args, **kwargs)#




Compute the weights for each cardinality of training set.

evaluate(y, y_hat)

Evaluate performance of the specified metric between label and predictions.


Return data values for each training data point.

input_data(x_train, y_train, x_valid, y_valid)

Store and transform input data for semi-value samplers.


Input data from a DataFetcher object.


Input the evaluation metric.


Input the prediction model.

input_model_metric(pred_model, metric)

Input the prediction model and the evaluation metric.

setup(fetcher[, pred_model, metric])

Inputs model, metric and data into Data Evaluator.

train(fetcher[, pred_model, metric])

Store and transform data, then train model to predict data values.

train_data_values(*args, **kwargs)

Uses sampler to trains model to find marginal contribs and data values.




Cached data values.