ml4chem.data package¶
Submodules¶
ml4chem.data.handler module¶
ml4chem.data.parser module¶
ml4chem.data.preprocessing module¶
-
class
ml4chem.data.preprocessing.
Preprocessing
(preprocessor, purpose)[source]¶ Bases:
object
A wrap for preprocessing data with sklearn
This intends to be a wrapper around sklearn. The idea is to make easier to preprocess data without too much burden to users.
- Parameters
preprocessor (tuple) – Tuple with structure: (‘name’, {kwargs}).
purpose (str) – Supported purposes are : ‘training’, ‘inference’.
Notes
The list of preprocessing modules available on sklearn and options can be found at:
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing
If you need a preprocessor that is not implemented yet, just create a bug report or follow the structure shown below to implement it yourself (PR are very welcomed). In principle, all preprocessors can be implemented.
-
fit
(stacked_features, scheduler)[source]¶ Fit features
- Parameters
stacked_features (list) – List of stacked features.
scheduler (str) – What is the scheduler to be used in dask.
- Returns
scaled_features – Scaled features using requested preprocessor.
- Return type
list
-
save_to_file
(preprocessor, path)[source]¶ Save the preprocessor object to file
- Parameters
preprocessor (obj) – Preprocessing object
path (str) – Path to save .prep file.
ml4chem.data.serialization module¶
-
ml4chem.data.serialization.
dump
(data, filename='data.db')[source]¶ Serialize data
This function allows to dump data and ML4Chem dictionaries serialized with msgpack, or torch (depending on the models).
- Parameters
data (dict or array) – A dictionary or array containting data to be saved to file using msgpack.
filename (str) – Name of file to save in disk.