opendataval.dataloader.Register#

class opendataval.dataloader.Register(dataset_name: str, one_hot: bool = False, cacheable: bool = False, presplit: bool = False)#

Register a data set by defining its name and adding functions to retrieve data.

Registers data sets to be fetched by the DataFetcher. Also allows specific transformations to be applied on a data set. This gives the benefit of creating Register objects to distinguish separate data sets

Parameters#

dataset_namestr

Data set name

one_hotbool, optional

Whether the data set is one hot encoded labeled, by default False

cacheablebool, optional

Whether data set can be downloaded and cached, by default False

presplitbool, optional

Whether the data set was presplit, by default False

Warns#

Warning

Register keeps track of all data set names registered and all must be unique. If there are any duplicates, warns user.

__init__(dataset_name: str, one_hot: bool = False, cacheable: bool = False, presplit: bool = False)#

Methods

__init__(dataset_name[, one_hot, cacheable, ...])

add_covar_transform(transform)

Add covariate transform after data is fetched.

add_label_transform(transform)

Add label transform after data is fetched.

from_covar_func(func, *args, **kwargs)

Register data set from 2 Callables, registers covariates Callable.

from_covar_label_func(func, *args, **kwargs)

Register data set from Callable -> (covariates, labels).

from_csv(filepath, label_columns)

Register data set from csv file.

from_data(covar, label[, one_hot])

Register data set from covariate and label numpy array.

from_label_func(func, *args, **kwargs)

Register data set from 2 Callables, registers labels Callable.

from_numpy(array, label_columns)

Register data set from covariate and label numpy array.

from_pandas(df, label_columns)

Register data set from pandas data frame.

load_data([cache_dir, force_download])

Retrieve data from specified data input functions.

Attributes

CACHE_DIR

Default directory to cache downloads to.

Datasets

Creates a directory for all registered/downloadable data set functions.