opendataval.dataval.oob package#
Submodules#
opendataval.dataval.oob.oob module#
- class opendataval.dataval.oob.oob.DataOob(*args, **kwargs)#
Bases:
DataEvaluator
,ModelMixin
Data Out-of-Bag data valuation implementation.
Input evaluation metrics are valid if we compare one data point across several predictions. Examples include: accuracy and L2 distance
References#
Parameters#
- num_modelsint, optional
Number of models to bag/aggregate, by default 1000
- proportionfloat, optional
Proportion of data points in the in-bag sample. sample_size = len(dataset) * proportion, by default 1.0
- random_stateRandomState, optional
Random initial state, by default None
- evaluate_data_values() ndarray #
Return data values for each training data point.
Compute data values by evaluating how the OOB labels compare to training labels.
Returns#
- np.ndarray
Predicted data values/selection for every training data point
- input_data(x_train: Tensor, y_train: Tensor, x_valid: Tensor, y_valid: Tensor)#
Store and transform input data for Data Out-Of-Bag Evaluator.
Parameters#
- x_traintorch.Tensor
Data covariates
- y_traintorch.Tensor
Data labels
- x_validtorch.Tensor
Test+Held-out covariates, unused by DataOob
- y_validtorch.Tensor
Test+Held-out labels, unused by DataOob
- train_data_values(*args, **kwargs)#
Trains model to predict data values.
Trains Data Out-of-Bag model by bagging a model and collecting all out-of-bag predictions. We then evaluate each data point to their out-of-bag predictions.
Parameters#
- argstuple[Any], optional
Training positional args
- kwargsdict[str, Any], optional
Training key word arguments
- class opendataval.dataval.oob.oob.GroupingIndex(start: int = 0)#
Bases:
defaultdict
[int
,list
[int
]]Stores value and position of insertion in a stack.
Parameters#
- startint, optional
Starting insertion position, increments after each insertion, by default 0
- add_indices(values: list[int])#
Add values to defaultdict and record position in stack in-order.