ml4chem.data package¶

Submodules¶

ml4chem.data.handler module¶

ml4chem.data.parser module¶

ml4chem.data.preprocessing module¶

class ml4chem.data.preprocessing.Preprocessing(preprocessor, purpose)[source]¶

Bases: object

A wrap for preprocessing data with sklearn

This intends to be a wrapper around sklearn. The idea is to make easier to preprocess data without too much burden to users.

Parameters

preprocessor (tuple) – Tuple with structure: (‘name’, {kwargs}).
purpose (str) – Supported purposes are : ‘training’, ‘inference’.

Notes

The list of preprocessing modules available on sklearn and options can be found at:

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

If you need a preprocessor that is not implemented yet, just create a bug report or follow the structure shown below to implement it yourself (PR are very welcomed). In principle, all preprocessors can be implemented.

fit(stacked_features, scheduler)[source]¶

Fit features

Parameters

stacked_features (list) – List of stacked features.
scheduler (str) – What is the scheduler to be used in dask.

Returns

scaled_features – Scaled features using requested preprocessor.

Return type

list

save_to_file(preprocessor, path)[source]¶

Save the preprocessor object to file

Parameters

preprocessor (obj) – Preprocessing object
path (str) – Path to save .prep file.

set(purpose)[source]¶

Set a preprocessing method

Parameters: purpose (str) – Supported purposes are : ‘training’, ‘inference’.
Returns
Return type: Preprocessor object.

transform(raw_features)[source]¶

Transform features to scaled features

Given a Preprocessor object, we return features.

Parameters: raw_features (list) – Unscaled features.
Returns: scaled_features – Scaled features using the scaler set in self.set().
Return type: list

ml4chem.data.serialization module¶

ml4chem.data.serialization.dump(data, filename='data.db')[source]¶

Serialize data

This function allows to dump data and ML4Chem dictionaries serialized with msgpack, or torch (depending on the models).

Parameters

data (dict or array) – A dictionary or array containting data to be saved to file using msgpack.
filename (str) – Name of file to save in disk.

ml4chem.data.serialization.load(filename)[source]¶

Load a msgpack file

Parameters: filename (str) – Path of file to load from disk.

ml4chem.data package¶

Submodules¶

ml4chem.data.handler module¶

ml4chem.data.parser module¶

ml4chem.data.preprocessing module¶

ml4chem.data.serialization module¶

ml4chem.data.utils module¶

Module contents¶