opendataval.dataloader.Register#
- class opendataval.dataloader.Register(dataset_name: str, one_hot: bool = False, cacheable: bool = False, presplit: bool = False)#
Register a data set by defining its name and adding functions to retrieve data.
Registers data sets to be fetched by the DataFetcher. Also allows specific transformations to be applied on a data set. This gives the benefit of creating
Registerobjects to distinguish separate data setsParameters#
- dataset_namestr
Data set name
- one_hotbool, optional
Whether the data set is one hot encoded labeled, by default False
- cacheablebool, optional
Whether data set can be downloaded and cached, by default False
- presplitbool, optional
Whether the data set was presplit, by default False
Warns#
- Warning
Registerkeeps track of all data set names registered and all must be unique. If there are any duplicates, warns user.
- __init__(dataset_name: str, one_hot: bool = False, cacheable: bool = False, presplit: bool = False)#
Methods
__init__(dataset_name[, one_hot, cacheable, ...])add_covar_transform(transform)Add covariate transform after data is fetched.
add_label_transform(transform)Add label transform after data is fetched.
from_covar_func(func, *args, **kwargs)Register data set from 2 Callables, registers covariates Callable.
from_covar_label_func(func, *args, **kwargs)Register data set from Callable -> (covariates, labels).
from_csv(filepath, label_columns)Register data set from csv file.
from_data(covar, label[, one_hot])Register data set from covariate and label numpy array.
from_label_func(func, *args, **kwargs)Register data set from 2 Callables, registers labels Callable.
from_numpy(array, label_columns)Register data set from covariate and label numpy array.
from_pandas(df, label_columns)Register data set from pandas data frame.
load_data([cache_dir, force_download])Retrieve data from specified data input functions.
Attributes
CACHE_DIRDefault directory to cache downloads to.
DatasetsCreates a directory for all registered/downloadable data set functions.