opendataval.experiment package#

Submodules#

opendataval.experiment.api module#

Bases: object

Set up an experiment to compare a group of DataEvaluators.

Attributes#

timings : dict[str, timedelta]

Parameters#

fetcherDataFetcher: DataFetcher for the data set used for the experiment. All exper_func take a DataFetcher as an argument to have access to all data points and noisy indices.
pred_modelModel: Prediction model for the DataEvaluators
train_kwargsdict[str, Any], optional: Training key word arguments for the prediction model, by default None
metric_namestr | Metric | Callable[[Tensor, Tensor], float], optional: Name of the performance metric used to evaluate the performance of the prediction model, by default accuracy
output_dir: Union[str, pathlib.Path], optional: Output directory of experiments
raises_error: bool, optional: Raises exception if one of the data evaluators fail, otherwise warns the user but continues computation. By default, False

compute_data_values(data_evaluators: list[DataEvaluator], *args, **kwargs)#

Computes the data values for the input data evaluators.

Parameters#

data_evaluatorslist[DataEvaluator]: List of DataEvaluators to be tested by exper_func

evaluate(exper_func: Callable[[DataEvaluator, DataFetcher, ...], dict[str, Any]], save_output: bool = False, **exper_kwargs) → DataFrame#

Evaluate exper_func on each DataEvaluator.

Runs an experiment on a list of pre-train DataEvaluators and their corresponding dataset and returns a DataFrame of the results.

Parameters#

exper_funcCallable[[DataEvaluator, DataFetcher, …], dict[str, Any]]: Experiment function, runs an experiment on a DataEvaluator and the data of the DataFetcher associated. Output must be a dict with results of the experiment. NOTE, the results must all be <= 1 dimensional but does not need to be the same length.
save_outputbool, optional: Wether to save the outputs to self.output_dir, by default False
eval_kwargsdict[str, Any], optional: Additional key word arguments to be passed to the exper_func

Returns#

pd.DataFrame: DataFrame containing the results for each DataEvaluator experiment. DataFrame is indexed: [DataEvaluator.DataEvaluator]

classmethod model_factory_setup(dataset_name: str, cache_dir: str | ~pathlib.Path | None = None, force_download: bool = False, train_count: int | float = 0, valid_count: int | float = 0, test_count: int | float = 0, add_noise: ~typing.Callable[[~opendataval.dataloader.fetcher.DataFetcher], dict[str, ~typing.Any]] | str = <function mix_labels>, noise_kwargs: dict[str, ~typing.Any] | None = None, random_state: ~numpy.random.mtrand.RandomState | None = None, model_name: str | None = None, device: ~torch.device = device(type='cpu'), train_kwargs: dict[str, ~typing.Any] | None = None, metric_name: str | ~opendataval.metrics.Metrics | ~typing.Callable | None = None, output_dir: str | ~pathlib.Path | None = None, raises_error: bool = False)#

Set up ExperimentMediator from ModelFactory using an input string.

Return a ExperimentMediator initialized with py:function`~opendataval.model.ModelFactory`

Parameters#

dataset_namestr

Name of the data set, must be registered with Register

cache_dirUnion[str, pathlib.Path], optional

Directory of where to cache the loaded data, by default None which uses Register.CACHE_DIR

force_downloadbool, optional

Forces download from source URL, by default False

train_countUnion[int, float]

Number/proportion training points

valid_countUnion[int, float]

Number/proportion validation points

test_countUnion[int, float]

Number/proportion test points

add_noiseCallable

If None, no changes are made. Takes as argument required arguments DataFetcher and adds noise to those the data points of DataFetcher as needed. Returns dict[str, np.ndarray] that has the updated np.ndarray in a dict to update the data loader with the following keys:

“x_train” – Updated training covariates with noise, optional
“y_train” – Updated training labels with noise, optional
“x_valid” – Updated validation covariates with noise, optional
“y_valid” – Updated validation labels with noise, optional
“x_test” – Updated testing covariates with noise, optional
“y_test” – Updated testing labels with noise, optional
“noisy_train_indices” – Indices of training data set with noise

noise_kwargsdict[str, Any], optional

Key word arguments passed to add_noise, by default None

random_stateRandomState, optional

Random initial state, by default None

model_namestr, optional

Name of the preset model, check model_factory() for preset models, by default None

devicetorch.device, optional

Tensor device for acceleration, by default torch.device(“cpu”)

metric_namestr | Metric | Callable[[Tensor, Tensor], float], optional

Name of the performance metric used to evaluate the performance of the prediction model, by default accuracy

train_kwargsdict[str, Any], optional

Training key word arguments for the prediction model, by default None

output_dir: Union[str, pathlib.Path]

Output directory of experiments

raises_error: bool, optional

Raises exception if one of the data evaluators fail, otherwise warns the user but continues computation. By default, False

Returns#

ExperimentMediator: ExperimentMediator created from ModelFactory defaults

plot(exper_func: Callable[[DataEvaluator, DataFetcher, Axes, ...], dict[str, Any]], figure: Figure | None = None, row: int | None = None, col: int = 2, save_output: bool = False, **exper_kwargs) → tuple[DataFrame, Figure]#

Evaluate exper_func on each DataEvaluator and plots result in fig.

Run an experiment on a list of pre-train DataEvaluators and their corresponding dataset and plots the result.

Parameters#

exper_funcCallable[[DataEvaluator, DataFetcher, Axes, …], dict[str, Any]]: Experiment function, runs an experiment on a DataEvaluator and the data of the DataFetcher associated. Output must be a dict with results of the experiment. NOTE, the results must all be <= 1 dimensional but does not need to be the same length.
figFigure, optional: MatPlotLib Figure which each experiment result is plotted, by default None
rowint, optional: Number of rows of subplots in the plot, by default set to num_evaluators/col
colint, optional: Number of columns of subplots in the plot, by default 2
save_outputbool, optional: Wether to save the outputs to self.output_dir, by default False
eval_kwargsdict[str, Any], optional: Additional key word arguments to be passed to the exper_func

Returns#

tuple[pd.DataFrame, Figure]

DataFrame containing the results for each DataEvaluator experiment. DataFrame is indexed: [DataEvaluator.DataEvaluator]

Figure is a plotted version of the results dict.

save_output(file_name: str, df: DataFrame)#

Saves the output of the DataFrame to f”{self.output_directory}/{file_name}”.

Parameters#

file_namestr: Name of the file to save the DataFrame to.
dfpd.DataFrame: Output DataFrame from an experiment run by ExperimentMediator

set_output_directory(output_directory: str | Path)#: Set directory to save output of experiment.

classmethod setup(dataset_name: str, cache_dir: str | ~pathlib.Path | None = None, force_download: bool = False, train_count: int | float = 0, valid_count: int | float = 0, test_count: int | float = 0, add_noise: ~typing.Callable[[~opendataval.dataloader.fetcher.DataFetcher], dict[str, ~typing.Any]] | str = <function mix_labels>, noise_kwargs: dict[str, ~typing.Any] | None = None, random_state: ~numpy.random.mtrand.RandomState | None = None, pred_model: ~opendataval.model.api.Model | None = None, train_kwargs: dict[str, ~typing.Any] | None = None, metric_name: str | ~opendataval.metrics.Metrics | ~typing.Callable | None = None, output_dir: str | ~pathlib.Path | None = None, raises_error: bool = False)#: Create a DataFetcher from args and passes it into the init.

opendataval.experiment.exper_methods module#

Experiments to test DataEvaluator.

Experiments pass into evaluate() and plot() evaluate performance of one DataEvaluator at a time.

opendataval.experiment.exper_methods.discover_corrupted_sample(evaluator: DataEvaluator, fetcher: DataFetcher | None = None, data: dict[str, Any] | None = None, percentile: float = 0.05, plot: Axes | None = None) → dict[str, list[float]]#

Evaluate discovery of noisy indices in low data value points.

Repeatedly explores percentile of the data values and determines if within that total percentile, what proportion of the noisy indices are found.

Parameters#

evaluatorDataEvaluator

DataEvaluator to be tested

fetcherDataFetcher, optional

DataFetcher containing noisy indices, by default None

datadict[str, Any], optional

Alternatively, pass in dictionary instead of a DataFetcher with the training and test data with the following keys:

“x_train” Training covariates

percentilefloat, optional

Percentile of data points to additionally search per iteration, by default .05

plotAxes, optional

Matplotlib Axes to plot data output, by default None

Returns#

Dict[str, list[float]]

dict containing list of the proportion of noisy indices found after exploring the (i * percentile) least valuable data points. If plot is not None, also returns optimal and random search performances as lists

“axis” – Proportion of data values explored currently.
“corrupt_found” – Proportion of corrupted data values found currently
“optimal” – Optimal proportion of corrupted values found currently
meaning if the inspected only contained corrupted samples until the number of corrupted samples are completely exhausted.
“random” – Random proportion of corrupted samples found, meaning
if the data points were explored randomly, we’d expect to find corrupted_samples in proportion to the number of corruption in the data set.

opendataval.experiment.exper_methods.increasing_bin_removal(evaluator: DataEvaluator, fetcher: DataFetcher | None = None, model: Model | None = None, data: dict[str, Any] | None = None, bin_size: int = 1, plot: Axes | None = None, metric: Metrics = Metrics.ACCURACY, train_kwargs: dict[str, Any] | None = None) → dict[str, list[float]]#

Evaluate accuracy after removing data points with data values above threshold.

For each subplot, displays the proportion of the data set with data values less than the specified data value (x-axis) and the performance of the model when all data values greater than the specified data value is removed. This implementation was inspired by V. Feldman and C. Zhang in their paper [1] where the same principle was applied to memorization functions.

References#

Parameters#

evaluatorDataEvaluator

DataEvaluator to be tested

fetcherDataFetcher, optional

DataFetcher containing training and valid data points, by default None

modelModel, optional

Model which performance will be evaluated, if not defined, uses evaluator’s model to evaluate performance if evaluator uses a model

datadict[str, Any], optional

Alternatively, pass in dictionary instead of a DataFetcher with the training and test data with the following keys:

“x_train” Training covariates
“y_train” Training labels
“x_test” Testing covariates
“y_test” Testing labels

bin_sizefloat, optional

We look at bins of equal size and find the data values cutoffs for the x-axis, by default 1

plotAxes, optional

Matplotlib Axes to plot data output, by default None

metricMetrics | Callable[[Tensor, Tensor], float], optional

Name of DataEvaluator defined performance metric which is one of the defined metrics or a Callable[[Tensor, Tensor], float], by default accuracy

train_kwargsdict[str, Any], optional

Training key word arguments for training the pred_model, by default None

Returns#

Dict[str, list[float]]

dict containing the thresholds of data values examined, proportion of training data points removed, and performance after those data points were removed.

“axis” – Thresholds of data values examined. For a given threshold,
considers the subset of data points with data values below.
“frac_datapoints_explored” – Proportion of data points with data values
below the specified threshold
f”{metric}_at_datavalues” – Performance metric when data values
above the specified threshold are removed

opendataval.experiment.exper_methods.noisy_detection(evaluator: DataEvaluator, fetcher: DataFetcher | None = None, indices: list[int] | None = None) → dict[str, float]#

Evaluate ability to identify noisy indices.

Compute F1 score (of 2NN classifier) of the data evaluator on the noisy indices. KMeans labels are random, but because of the convexity of KMeans, the highest data point and lowest data point have different labels and belong to the most valuable/least valuable group. Thus, the least valuable group will be in one group and most valuable to zero for the F1 score.

Parameters#

evaluatorDataEvaluator: DataEvaluator to be tested
fetcherDataFetcher, optional: DataFetcher containing noisy indices
indiceslist[int], optional: Alternatively, pass in noisy indices instead of DataFetcher, by default None

Returns#

dict[str, float]

“kmeans_f1” – F1 score performance of a 1D KNN binary classifier
of the data points. Classifies the lower data value data points as corrupted, and the higher value data points as correct.

opendataval.experiment.exper_methods.remove_high_low(evaluator: DataEvaluator, fetcher: DataFetcher | None = None, model: Model | None = None, data: dict[str, Any] | None = None, percentile: float = 0.05, plot: Axes | None = None, metric: Metrics = Metrics.ACCURACY, train_kwargs: dict[str, Any] | None = None) → dict[str, list[float]]#

Evaluate performance after removing high/low points determined by data valuator.

Repeatedly removes percentile of most valuable/least valuable data points and computes the performance of the metric.

Parameters#

evaluatorDataEvaluator

DataEvaluator to be tested

fetcherDataFetcher, optional

DataFetcher containing training and testing data points, by default None

modelModel, optional

Model which performance will be evaluated, if not defined, uses evaluator’s model to evaluate performance if evaluator uses a model

datadict[str, Any], optional

Alternatively, pass in dictionary instead of a DataFetcher with the training and test data with the following keys:

“x_train” Training covariates
“y_train” Training labels
“x_test” Testing covariates
“y_test” Testing labels

percentilefloat, optional

Percentile of data points to remove per iteration, by default 0.05

plotAxes, optional

Matplotlib Axes to plot data output, by default None

metricMetrics | Callable[[Tensor, Tensor], float], optional

Name of DataEvaluator defined performance metric which is one of the defined metrics or a Callable[[Tensor, Tensor], float], by default accuracy

train_kwargsdict[str, Any], optional

Training key word arguments for training the pred_model, by default None

Returns#

dict[str, list[float]]

dict containing list of the performance of the DataEvaluator (i * percentile) valuable/most valuable data points are removed

“axis” – Proportion of data values removed currently
f”remove_least_influential_first_{metric}” – Performance of model
after removing a proportion of the data points with the lowest data values
“f”remove_most_influential_first_{metric}”” – Performance of model
after removing a proportion of the data points with the highest data values

opendataval.experiment.exper_methods.save_dataval(evaluator: DataEvaluator, fetcher: DataFetcher | None = None, indices: list[int] | None = None, output_path: Path | None = None)#: Save the indices and the respective data values of the DataEvaluator.

opendataval.experiment.util module#

opendataval.experiment.util.f1_score(predicted: Sequence[float], actual: Sequence[float], total: int) → float#: Computes the F1 score based on the indices of values found.

opendataval.experiment.util.filter_kwargs(func: Callable, **kwargs) → dict[str, Any]#

Filters out non-arguments of a specific function out of kwargs.

Parameters#

funcCallable: Function with a specified signature, whose kwargs can be extracted from kwargs
kwargsdict[str, Any]: Key word arguments passed to the function

Returns#

dict[str, Any]: Key word arguments of func that are passed in as kwargs

opendataval.experiment.util.oned_twonn_clustering(vals: Sequence[float]) → tuple[Sequence[int], Sequence[int]]#

O(nlog(n)) sort, O(n) pass exact 2-NN clustering of 1 dimensional input data.

References#

Parameters#

valsSequence[float]: Input floats which to cluster

Returns#

tuple[Sequence[int], Sequence[int]]: Indices of the data points in each cluster, because of the convexity of KMeans, the first sequence represents the lower value group and the second the higher

Module contents#

Run experiments on DataEvaluator.

Experiment#

ExperimentMediator provides an API to set up an experiment. In exper_methods, there are several functions that can be used with ExperimentMediator to test a DataEvaluator.

Experiment Setup#

ExperimentMediator(fetcher, pred_model[, ...])

Set up an experiment to compare a group of DataEvaluators.

Experiments#

exper_methods

Experiments to test DataEvaluator.