ml4chem.data package¶
Submodules¶
ml4chem.data.handler module¶
-
class
ml4chem.data.handler.
Data
(images, purpose=None)[source]¶ Bases:
object
A Data class
An adequate data structure is very important to develop machine-learning models. In general a model receives a data set (X) and a target vector (y). This class should in principle arrange this in a format that can be vectorized and operate not only with neural networks but also with support vector machines.
The central object here is the data set.
- Parameters
images (list or object) – List of images. Supported format is from ASE.
purpose (str) – Is this data for training or inference purpose?. Supported strings are: “training”, and “inference”.
-
get_data
(purpose=None)[source]¶ A method to get data
- Parameters
purpose (str) – The purpose of the data so that structure is prepared accordingly. Supported are: ‘training’, ‘inference’
- Returns
self.images (dict) – Ordered dictionary of images corresponding to order of self.targets list.
self.targets (list) – Targets used for training the model.
-
get_unique_element_symbols
(images=None, purpose=None)[source]¶ Unique element symbol in data set
- Parameters
images (list of images.) – ASE object.
purpose (str) – The supported categories are: ‘training’, ‘inference’.
-
is_valid_structure
(images)[source]¶ Check if the data has a valid structure
- Parameters
images (list of atoms) – List of images.
- Returns
valid – Whether or not the structure is valid.
- Return type
bool
ml4chem.data.parser module¶
-
class
ml4chem.data.parser.
SinglePointCalculator
(implemented_properties=None)[source]¶ Bases:
ase.calculators.calculator.Calculator
A SinglePointCalculator class
This class creates a fake calculator that is used to populate calc.results dictionaries in ASE objects.
- Parameters
implemented_properties (list) – List with supported properties.
-
ml4chem.data.parser.
ani_to_ase
(hdf5file, data_keys, trajfile=None)[source]¶ ANI to ASE
- Parameters
hdf5file (hdf5, list) – hdf5 file loaded using pyanitools (or list of them).
data_keys (list) – List of keys to extract data.
trajfile (str, optional) – Name of trajectory file to be saved, by default None.
- Returns
A list of Atoms objects.
- Return type
atoms
ml4chem.data.preprocessing module¶
ml4chem.data.serialization module¶
ml4chem.data.utils module¶
-
ml4chem.data.utils.
ase_to_xyz
(atoms, comment='', file=True)[source]¶ Convert ASE to xyz
This function is useful to save xyz to DataFrame.
-
ml4chem.data.utils.
split_data
(images, training_name='training_images.traj', test_name='test_images.traj', randomize=True, test_set=20, logfile='data_split.log')[source]¶ Split Data
- Parameters
images (str or object) – A path to an ASE trajectory file or a list of Atoms objects.
training_name (str, optional) – Name of the training set trajectory file, by default ‘training_images.traj’
test_name (str, optional) – Name of the test set file, by default ‘test_images.traj’
randomize (bool, optional) – Randomize indices of images, by default True
test_set (int, optional) – Percentage of the Data to be used as test set, by default 20
logfile (str, optional) – Log file name, by default ‘data_split.log’