Trainer Module¶

Abstract Base Class¶

class orchestrator.trainer.trainer_base.Trainer(**kwargs)[source]¶

Bases: Recorder, ABC

Abstract base class to manage the training of different potentials

The trainer class is responsible for handling the loading/assignment of training data, as well as the actual process of training a potential

__init__(**kwargs)[source]¶: set variables and initialize the recorder and default workflow

default_wf¶: default workflow to use within the trainer class

abstract checkpoint_trainer()[source]¶

checkpoint the trainer module into the checkpoint file

save necessary internal variables into a dict with key checkpoint_name and write to the (json) checkpoint file for restart capabilities

abstract restart_trainer()[source]¶

restart the trainer module from the checkpoint file

check if the checkpoint_file has an entry matching the checkpoint_name and set internal variables accordingly if so

abstract train(path_type, potential, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, write_training_script=True, upload_to_kimkit=True)[source]¶

Train the potential based on the specific trainer details

This is a main method of the trainer class, and uses the parameters supplied at instantiation to perform the potential training by minimizing a loss function.

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
per_atom_weights (either boolean or np.ndarray) – True to read from dataset, or numpy array
Default: False
write_training_script (bool) – True to write a training script in the working trainer directory
Default: True
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

trained model, loss object

Return type:

implementation dependent

abstract submit_train(path_type, potential, storage, dataset_list, workflow, job_details, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Asychronously train the potential based on the trainer details

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (either boolean or np.ndarray) – True to read from dataset, or numpy array
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

calculation ID of the submitted job

Return type:

int

abstract load_from_submitted_training(calc_id, potential, workflow)[source]¶: reload a potential that was trained via a submitted job

Concrete Implementations¶

KLIFF base class¶

class orchestrator.trainer.kliff.kliff.KLIFFTrainer(training_split=0.8, loss_method='mse', max_evals=1000, optimization_method='L-BFGS-B', scratch=None, **kwargs)[source]¶

Bases: Trainer

Train and deploy a potential using KLIFF

The trainer class is responsible for handling the loading/assignment of training data, as well as the actual process of training a potential. One should use specific subclasses of KLIFFTrainer instead of this base class.

Parameters:

training_split (float) – Fraction of the dataset to be allocated for training (e.g., 0.8 for 80%). Defaults to 0.8.
loss_method (str) – The type of loss function to be used during training (e.g., “mse” for mean squared error).
max_evals (int) – Maximum number of evaluations (e.g., iterations or function calls) for the optimizer. Defaults to 1000.
optimization_method (str) – The optimization algorithm to employ for training the potential (e.g., “L-BFGS-B”, “Adam”)
scratch (str, optional) – Path to a directory for storing temporary or scratch files during training. If None, it defaults to ‘./scratch_kliff’ within the execution directory.
kwargs (dict) – Arbitrary keyword arguments that may be used by specific subclasses or for advanced configuration options.

__init__(training_split=0.8, loss_method='mse', max_evals=1000, optimization_method='L-BFGS-B', scratch=None, **kwargs)[source]¶

set variables and initialize the recorder and default workflow

Parameters:

training_split (float) – Fraction of the dataset to be allocated for training (e.g., 0.8 for 80%). Defaults to 0.8.
loss_method (str) – The type of loss function to be used during training (e.g., “mse” for mean squared error).
max_evals (int) – Maximum number of evaluations (e.g., iterations or function calls) for the optimizer. Defaults to 1000.
optimization_method (str) – The optimization algorithm to employ for training the potential (e.g., “L-BFGS-B”, “Adam”)
scratch (str, optional) – Path to a directory for storing temporary or scratch files during training. If None, it defaults to ‘./scratch_kliff’ within the execution directory.
kwargs (dict) – Arbitrary keyword arguments that may be used by specific subclasses or for advanced configuration options.

train(path_type, potential, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Train the potential based on the specific trainer details

KLIFFTrainer should not be used for training, it is a parent class to specific implementations

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (bool) – True to read from dataset,
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

trained model, loss object

Return type:

implementation dependent

submit_train(path_type, potential, storage_args, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Asynchronously train the potential based on the trainer details

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (bool) – Per atom weights for the loss function, If boolean, value is provided, the weights are assumed to be present in the provided dataset.
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

calculation ID of the submitted job

Return type:

int

DNN sub-class¶

class orchestrator.trainer.kliff.kliff_dunn_trainer.DUNNTrainer(use_gpu=False, loss_method='mse', epochs=100, batch_size=32, learning_rate=0.001, training_split=0.8, optimizer='Adam', log_per_atom_pred=True, **kwargs)[source]¶

Bases: KLIFFTrainer

Train and deploy a fully connected neural network based on Behler- Parrinello symmetry functions. This trainer uses the KIM DUNN driver for deploying the potential which has higher performance C++ backend and inbuilt support for UQ.

The trainer class is responsible for handling the loading/assignment of training data, as well as the actual process of training a potential. This trainer is intended to be used with kliff NeuralNetwork s, such as KliffBPPotential.

Parameters:

use_gpu (bool) – Whether to use a GPU for training
Default: False
loss_method (str) – Loss function to use
Default: ‘mse’
epochs (int) – Number of epochs to train the model
Default: 100
batch_size (int) – Number of configurations per mini-batch
Default: 32
learning_rate (float) – Learning rate used by the optimizer
Default: 0.001
training_split (float) – Fraction of data to use for training (rest for validation)
Default: 0.8
optimizer (str) – Optimizer to use for training
Default: ‘Adam’
log_per_atom_pred (bool) – Whether to log per-atom predictions during training for both in-memory and submitted jobs
Default: True
kwargs (dict) – Additional keyword arguments passed to the superclass.

__init__(use_gpu=False, loss_method='mse', epochs=100, batch_size=32, learning_rate=0.001, training_split=0.8, optimizer='Adam', log_per_atom_pred=True, **kwargs)[source]¶

Train and deploy a DNN potential using KLIFF

Parameters:

use_gpu (bool) – Whether to use a GPU for training
Default: False
loss_method (str) – Loss function to use
Default: ‘mse’
epochs (int) – Number of epochs to train the model
Default: 100
batch_size (int) – Number of configurations per mini-batch
Default: 32
learning_rate (float) – Learning rate used by the optimizer
Default: 0.001
training_split (float) – Fraction of data to use for training (rest for validation)
Default: 0.8
optimizer (str) – Optimizer to use for training
Default: ‘Adam’
per_atom_weights (bool) – Per atom weights for the loss function, If boolean, value is provided, the weights are assumed to be present in the provided dataset.
Default: False
kwargs (dict) – Additional keyword arguments passed to the superclass.

checkpoint_trainer()[source]¶

checkpoint the trainer module into the checkpoint file

save necessary internal variables into a dict with key checkpoint_name and write to the (json) checkpoint file for restart capabilities

restart_trainer()[source]¶

restart the trainer module from the checkpoint file

check if the checkpoint_file has an entry matching the checkpoint_name and set internal variables accordingly if so

train(path_type, potential, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Train a DNN potential using KLIFF

This is the main method of the trainer class, and uses the parameters supplied at instantiation to perform the potential training by minimizing a loss function.

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (KliffBPPotential) – KliffBPPotential class object containing model to be trained as an attribute
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (bool) – Per atom weights for the loss function, If boolean, value is provided, the weights are assumed to be present in the provided dataset.
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

trained model, loss object

Return type:

NeuralNetwork, Loss (KliFF)

submit_train(path_type, potential, storage, dataset_list, workflow, job_details, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Asynchronously train the potential based on the trainer details

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (bool) – Per atom weights for the loss function, If boolean, value is provided, the weights are assumed to be present in the provided dataset.
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

calculation ID of the submitted job

Return type:

int

load_from_submitted_training(calc_id, potential, workflow)[source]¶

reload a potential that was trained via a submitted job

Parameters:

calc_id (int) – calculation ID of the submitted training job
potential (KliffBPPotential) – KliffBPPotential class object that will be updated with the model saved to disk after the training job.
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None

Parametric model sub-class¶

class orchestrator.trainer.kliff.kliff_parametric_trainer.ParametricModelTrainer(model_name, params_to_update, training_split=1.0, loss_method='mse', max_evals=1000, optimization_method='L-BFGS-B', scratch=None, **kwargs)[source]¶

Bases: KLIFFTrainer

Train and deploy a general parametric model potential using KLIFF

Parameters:

model_name (str) – name of the model to train
params_to_update (list) – List of model parameters to update during training
training_split (float) – Fraction of data to use for training (rest for validation)
Default: 1.0
loss_method (str) – Loss function to use
Default: ‘mse’
max_evals (int) – Maximum number of optimization evaluations
Default: 1000
optimization_method (str) – Optimization algorithm to use
Default: ‘L-BFGS-B’
scratch (str or None) – Path to scratch directory for temporary files
Default: None

__init__(model_name, params_to_update, training_split=1.0, loss_method='mse', max_evals=1000, optimization_method='L-BFGS-B', scratch=None, **kwargs)[source]¶

Train and deploy a general parametric model potential using KLIFF

Parameters:

model_name (str) – name of the model to train
params_to_update (list) – List of model parameters to update during training
training_split (float) – Fraction of data to use for training (rest for validation)
Default: 1.0
loss_method (str) – Loss function to use
Default: ‘mse’
max_evals (int) – Maximum number of optimization evaluations
Default: 1000
optimization_method (str) – Optimization algorithm to use
Default: ‘L-BFGS-B’
scratch (str or None) – Path to scratch directory for temporary files
Default: None

checkpoint_trainer()[source]¶

checkpoint the trainer module into the checkpoint file

save necessary internal variables into a dict with key checkpoint_name and write to the (json) checkpoint file for restart capabilities

restart_trainer()[source]¶

restart the trainer module from the checkpoint file

check if the checkpoint_file has an entry matching the checkpoint_name and set internal variables accordingly if so

train(path_type, potential, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Train a parametric potential using KLIFF

This is the main method of the trainer class, and uses the parameters supplied at instantiation to perform the potential training by minimizing a loss function.

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (KIMPotential) – KIMPotential class object containing model to be trained as an attribute
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (bool) – Per atom weights for the loss function, If boolean, value is provided, the weights are assumed to be present in the provided dataset.
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

trained model, loss object

Return type:

KIMModel, None

submit_train(path_type, potential, storage_args, workflow, job_details, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, upload_to_kimkit=True)[source]¶

Asychronously train the potential based on the trainer details

Return type:: int

load_from_submitted_training(calc_id, potential, workflow)[source]¶: reload a potential that was trained via a submitted job

FitSnap class¶

class orchestrator.trainer.fitsnap.FitSnapTrainer(**kwargs)[source]¶

Bases: Trainer

Train and deploy a potential using FitSnap

__init__(**kwargs)[source]¶: Train and deploy a general parametric model potential using FitSnap

checkpoint_trainer()[source]¶

checkpoint the trainer module into the checkpoint file

save necessary internal variables into a dict with key checkpoint_name and write to the (json) checkpoint file for restart capabilities

restart_trainer()[source]¶

restart the trainer module from the checkpoint file

check if the checkpoint_file has an entry matching the checkpoint_name and set internal variables accordingly if so

train(path_type, potential, storage, dataset_list, workflow=None, eweight=1.0, fweight=1.0, vweight=1.0, per_atom_weights=False, write_training_script=True, upload_to_kimkit=True)[source]¶

Train a Snap potential using FitSnap

This is the main method of the trainer class, and uses the parameters supplied in the FitSnap settings file to perform the potential training

Parameters:

path_type (str) – if write_training_script=True, specifier for the workflow path, to differentiate training runs; else, the raw path to save files
potential (fitsnap instance) – FitSnapPotential class object containing fitsnap instance
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (either boolean or np.ndarray) – True to read from dataset, or numpy array, or a str for a numpy.loadtxt compatible filepath
Default: False
write_training_script (bool) – True to write a training script in the workflow created directory
Default: True; This is expected to always be left on if not being called by a submit_train() workflow!
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

trained model, error metrics

Return type:

fitsnap instance, fitsnap error attribute

Asychronously train the potential based on the trainer details

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – an instance of the storage class
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
job_details (dict) – job parameters such as walltime or # of nodes
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (either boolean or np.ndarray) – True to read from dataset, or numpy array, or a str for a numpy.loadtxt compatible filepath
Default: False
upload_to_kimkit (bool) – True to upload to kimkit repository

Dataset_list:

the list of dataset_handles (e.g. collabfit-IDs) within the storage object to use as the dataset.

Returns:

calculation ID of the submitted job

Return type:

int

load_from_submitted_training(calc_id, potential, workflow)[source]¶

reload a potential that was trained via a submitted job

Parameters:

calc_id (int) – calculation ID of the submitted training job
potential (KliffBPPotential) – KliffBPPotential class object that will be updated with the model saved to disk after the training job.
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None

ChIMES class¶

class orchestrator.trainer.chimes.ChIMESTrainer(exe_chimes_fit_1, exe_chimes_fit_2, fit_directory='_ChIMES_FIT', **kwargs)[source]¶

Bases: Trainer

Train and deploy a potential using ChIMES

The trainer class is responsible for handling the loading/assignment of training data, as well as the actual process of training a potential. This trainer is intended to be used with ChIMES model trained with ASE training data. WARNING: the fit directory location will be overwritten during any call to the train functions.

__init__(exe_chimes_fit_1, exe_chimes_fit_2, fit_directory='_ChIMES_FIT', **kwargs)[source]¶

Initialize the ChIMESTrainer.

Parameters:

exe_chimes_fit_1 (str) – Path to the first ChIMES fitting executable - /build/chimes_lsq (executable)
exe_chimes_fit_2 (str) – Path to the second ChIMES fitting executable - src/chimes_lsq.py (python script)
fit_directory (Optional[str]) – Directory for fitting outputs. WARNING: this directory location will be overwritten during any call to a training function
kwargs (dict) – Additional keyword arguments for the base Trainer.

checkpoint_trainer()[source]¶

checkpoint the trainer module into the checkpoint file

save necessary internal variables into a dict with key checkpoint_name and write to the (json) checkpoint file for restart capabilities

Return type:: None

restart_trainer()[source]¶

restart the trainer module from the checkpoint file

check if the checkpoint_file has an entry matching the checkpoint_name and set internal variables accordingly if so

Return type:: None

Train a ChIMES potential

This is the main method of the trainer class, and uses the parameters supplied in the ChIMES settings file to perform the potential training in the fit_directory locaiton specified at instantiation.

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs; currently unused in this function
potential (ChIMESPotential instance) – class object containing ChIMES instance
storage (Storage) – Storage instance to pull data from
dataset_list (list[str]) – List of dataset handles to train with
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
Default: None
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (boolean) – True to read from dataset
Default: False
write_training_script (bool) – True to write a training script in the working trainer directory
Default: True
upload_to_kimkit (bool) – Upload to kimkit after training.
Default: True

Returns:

Tuple of (trained ChIMES model, error metric).

Return type:

tuple[ChIMES, float]

Asychronously train the potential based on the trainer details

This is a main method of the trainer class, and uses the parameters supplied at instantiation to perform the potential training by minimizing a loss function. While train() works synchronously, this method submits training to a job scheduler. Unless fit_directory is set as an absolute path, it will be a local version in the working directory generated by the Workflow.

Parameters:

path_type (str) – specifier for the workflow path, to differentiate training runs
potential (Potential) – potential to be trained. The actual model itself is set as an attribute of the Potential object
storage (Storage) – Storage instance to pull data from
dataset_list (list[str]) – List of dataset handles to train with
workflow (Workflow) – the workflow for managing path definition and job submission, if none are supplied, will use the default workflow defined in this class
job_details (dict) – job parameters such as walltime or # of nodes
eweight (float) – weight of energy data in the loss function
fweight (float) – weight of the force data in the loss function
vweight (float) – weight of the stress data in the loss function
per_atom_weights (boolean) – True to read from dataset
Default: False
upload_to_kimkit (bool) – Upload to kimkit after training
Default: True

Returns:

calculation ID of the submitted job

Return type:

int

load_from_submitted_training(calc_id, potential, workflow)[source]¶

reload a potential that was trained via a submitted job

Parameters:

calc_id (int) – calculation ID of the submitted training job
potential (ChIMESPotential) – ChIMESPotential class object that will be updated with the model saved to disk after the training job.
workflow (Workflow) – the workflow for managing path definition and job submission

Return type:

None

Trainer Builder¶

orchestrator.trainer.factory.trainer_factory = <orchestrator.utils.module_factory.ModuleFactory object>¶: default factory for trainers, includes DNN (kliff) and KLIFF (parametric model)

class orchestrator.trainer.factory.TrainerBuilder(factory=<orchestrator.utils.module_factory.ModuleFactory object>)[source]¶

Bases: ModuleBuilder

Constructor for trainers added in the factory

set the factory to be used for the builder. The default is to use the trainer_factory generated at the end of this module. A user defined ModuleFactory can optionally be supplied instead.

Parameters:: factory (ModuleFactory) – a trainer factory
Default: trainer_factory

__init__(factory=<orchestrator.utils.module_factory.ModuleFactory object>)[source]¶

constructor for the TrainerBuilder, sets the factory to build from

Parameters:: factory (ModuleFactory) – a trainer factory
Default: trainer_factory

build(trainer_type, trainer_args=None)[source]¶

Return an instance of the specified trainer

The build method takes the specifier and input arguments to construct a concrete trainer instance.

Parameters:

trainer_type (str) – token of a trainer which has been added to the factory
trainer_args (dict) – arguments to control trainer behavior

Returns:

instantiated concrete Trainer

Return type:

Trainer

orchestrator.trainer.factory.trainer_builder = <orchestrator.trainer.factory.TrainerBuilder object>¶: trainer builder object which can be imported for use in other modules