opendataval.dataval.GrTMCSampler#

class opendataval.dataval.GrTMCSampler(*args, **kwargs)#

TMC Sampler with terminator for semivalue-based methods of computing data values.

Evaluators that share marginal contributions should share a sampler.

References#

Parameters#

gr_thresholdfloat, optional

Convergence threshold for the Gelman-Rubin statistic. Shapley values are NP-hard so we resort to MCMC sampling, by default 1.05

max_mc_epochsint, optional

Max number of outer epochs of MCMC sampling, by default 100

models_per_epochint, optional

Number of model fittings to take per epoch prior to checking GR convergence, by default 100

min_modelsint, optional

Minimum samples before checking MCMC convergence, by default 1000

min_cardinalityint, optional

Minimum cardinality of a training set, must be passed as kwarg, by default 5

cache_namestr, optional

Unique cache_name of the model to cache marginal contributions, set to None to disable caching, by default “” which is set to a unique value for a object

random_stateRandomState, optional

Random initial state, by default None

__init__(gr_threshold: float = 1.05, max_mc_epochs: int = 100, models_per_epoch: int = 100, min_models: int = 1000, min_cardinality: int = 5, cache_name: str | None = '', random_state: RandomState | None = None)#

Methods

__init__([gr_threshold, max_mc_epochs, ...])

compute_marginal_contribution(*args, **kwargs)

Compute the marginal contributions for semivalue based data evaluators.

set_coalition(coalition)

Initializes storage to find marginal contribution of each data point

set_evaluator(value_func)

Sets the evaluator function to evaluate the utility of a coalition

Attributes

CACHE

Cached marginal contributions.

GR_MAX

Default maximum Gelman-Rubin statistic.