ml4chem.data package¶

Submodules¶

ml4chem.data.handler module¶

class ml4chem.data.handler.Data(images, purpose=None)[source]¶

Bases: object

A Data class

An adequate data structure is very important to develop machine-learning models. In general a model receives a data set (X) and a target vector (y). This class should in principle arrange this in a format that can be vectorized and operate not only with neural networks but also with support vector machines.

The central object here is the data set.

Parameters

images (list or object) – List of images. Supported format is from ASE.
purpose (str) – Is this data for training or inference purpose?. Supported strings are: “training”, and “inference”.

get_data(purpose=None)[source]¶

A method to get data

Parameters

purpose (str) – The purpose of the data so that structure is prepared accordingly. Supported are: ‘training’, ‘inference’

Returns

self.images (dict) – Ordered dictionary of images corresponding to order of self.targets list.
self.targets (list) – Targets used for training the model.

get_total_number_atoms()[source]¶: Get the total number of atoms

get_unique_element_symbols(images=None, purpose=None)[source]¶

Unique element symbol in data set

Parameters

images (list of images.) – ASE object.
purpose (str) – The supported categories are: ‘training’, ‘inference’.

is_valid_structure(images)[source]¶

Check if the data has a valid structure

Parameters: images (list of atoms) – List of images.
Returns: valid – Whether or not the structure is valid.
Return type: bool

prepare_images(images, purpose=None)[source]¶

Function to prepare images to operate with ML4Chem

Parameters

images (list or object) – List of images.
purpose (str) – The purpose of the data so that structure is prepared accordingly. Supported are: ‘training’, ‘inference’

to_pandas()[source]¶: Convert data to pandas DataFrame

ml4chem.data.parser module¶

class ml4chem.data.parser.SinglePointCalculator(implemented_properties=None)[source]¶

Bases: ase.calculators.calculator.Calculator

A SinglePointCalculator class

This class creates a fake calculator that is used to populate calc.results dictionaries in ASE objects.

Parameters: implemented_properties (list) – List with supported properties.

static get_forces(atoms)[source]¶

Get atomic forces

Parameters: atoms (obj) – Atoms objects
Returns: The atomic force of the molecule.
Return type: forces

static get_potential_energy(atoms)[source]¶

Get the potential energy

Parameters: atoms (obj) – Atoms objects
Returns: The energy of the molecule.
Return type: energy

ml4chem.data.parser.ani_to_ase(hdf5file, data_keys, trajfile=None)[source]¶

ANI to ASE

Parameters

hdf5file (hdf5, list) – hdf5 file loaded using pyanitools (or list of them).
data_keys (list) – List of keys to extract data.
trajfile (str, optional) – Name of trajectory file to be saved, by default None.

Returns

A list of Atoms objects.

Return type

atoms

ml4chem.data.parser.cjson_parser(cjsonfile, trajfile=None)[source]¶

Parse CJSON files

Parameters

cjsonfile (str) – Path to the CJSON file.
trajfile (str, optional) – Name of trajectory file to be saved, by default None.

Returns

A list of Atoms objects.

Return type

atoms

ml4chem.data.parser.cjson_to_ase(cjson)[source]¶

ml4chem.data.parser.get_total_energy(cjson)[source]¶

ml4chem.data.preprocessing module¶

ml4chem.data.serialization module¶

ml4chem.data.utils module¶

ml4chem.data.utils.ase_to_xyz(atoms, comment='', file=True)[source]¶

Convert ASE to xyz

This function is useful to save xyz to DataFrame.

ml4chem.data.utils.split_data(images, training_name='training_images.traj', test_name='test_images.traj', randomize=True, test_set=20, logfile='data_split.log')[source]¶

Split Data

Parameters

images (str or object) – A path to an ASE trajectory file or a list of Atoms objects.
training_name (str, optional) – Name of the training set trajectory file, by default ‘training_images.traj’
test_name (str, optional) – Name of the test set file, by default ‘test_images.traj’
randomize (bool, optional) – Randomize indices of images, by default True
test_set (int, optional) – Percentage of the Data to be used as test set, by default 20
logfile (str, optional) – Log file name, by default ‘data_split.log’

ml4chem.data package¶

Submodules¶

ml4chem.data.handler module¶

ml4chem.data.parser module¶

ml4chem.data.preprocessing module¶

ml4chem.data.serialization module¶

ml4chem.data.utils module¶

Module contents¶