opendataval.dataval.oob package#

Submodules#

opendataval.dataval.oob.oob module#

class opendataval.dataval.oob.oob.DataOob(*args, **kwargs)#

Bases: DataEvaluator, ModelMixin

Data Out-of-Bag data valuation implementation.

Input evaluation metrics are valid if we compare one data point across several predictions. Examples include: accuracy and L2 distance

References#

Parameters#

num_modelsint, optional

Number of models to bag/aggregate, by default 1000

proportionfloat, optional

Proportion of data points in the in-bag sample. sample_size = len(dataset) * proportion, by default 1.0

random_stateRandomState, optional

Random initial state, by default None

evaluate_data_values() ndarray#

Return data values for each training data point.

Compute data values by evaluating how the OOB labels compare to training labels.

Returns#

np.ndarray

Predicted data values/selection for every training data point

input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#

Store and transform input data for Data Out-Of-Bag Evaluator.

Parameters#

x_traintorch.Tensor

Data covariates

y_traintorch.Tensor

Data labels

x_validtorch.Tensor

Test+Held-out covariates, unused by DataOob

y_validtorch.Tensor

Test+Held-out labels, unused by DataOob

train_data_values(*args, **kwargs)#

Trains model to predict data values.

Trains Data Out-of-Bag model by bagging a model and collecting all out-of-bag predictions. We then evaluate each data point to their out-of-bag predictions.

Parameters#

argstuple[Any], optional

Training positional args

kwargsdict[str, Any], optional

Training key word arguments

class opendataval.dataval.oob.oob.GroupingIndex(start: int = 0)#

Bases: defaultdict[int, list[int]]

Stores value and position of insertion in a stack.

Parameters#

startint, optional

Starting insertion position, increments after each insertion, by default 0

add_indices(values: list[int])#

Add values to defaultdict and record position in stack in-order.

Module contents#