Trainer

See the full API for the module at Trainer Module. The abstract base class Trainer provides the standard interface for all of the concrete implementations.

This module manages both the organization of data (to train, verify, or perform analysis) as well as the training of IAP models using said data. Trainers may be specific to a certain model or class of model, or may be universal.

List of Available Trainers

There are currently several primary groups of trainers available: the FitSnapTrainer, which is used with the FitSnapPotential potential, the ChIMESTrainer, which is used with the ChIMESPotential potential, and the DUNNTrainer and the ParametricModelTrainer, which are used with the KliffBPPotential and the generic KIMPotential potentials.

Training Modes

The trainer module defines two training modes: in memory and distributed. These are executed by the train() and submit_train() methods, respectively. The distributed approach saves the current potential to disk and then creates a job script which restarts the orchestrator on a compute node to perform the training, saving the trained potential to disk at the end. By default, both methods will attempt to upload the trained potential to kimkit, iterating the version number if working off of a current kimkit potential, or creating a new kimID otherwise. The trainer_potential_workflow_test() serves as an example of how to use trainers, though they disable the kimkit upload and pull data from test databases.

While the in-memory approach can be used for small tests, it is generally advised to use the distributed version (or use the train() method within a compute node). Note also that the submit_train() method requires the writing and reading of a model to disk. Some of the KIMPotential architectures do not support this writing and therefore can only use the in-memory training version.

Hyperparameter and Settings Control

The train() and submit_train() methods require four things: a string used to define the name of the directory path that the Workflow will create, the related Potential class object, a Storage class object, and a list of the dataset IDs to pull from the storage object. Additionally, the submit_train() requires a Workflow object and a dictionary of job settings (e.g. walltime, # of nodes).

The hyperparameters that control the form of the potential or its descriptors are controlled by the relevant Potential object. The train() and submit_train() methods will accept additional optional arguments for training specific hyperparameters, such as data weighting, # of epochs, etc.

See the Potential page for potential initialization details, with details for specific potential types located at Specific Potential Details.

See additional details for the Kliff trainers at Trainer Manifest for KLIFF.

Optional Atomic Data Weighting

The Trainer currently available all support atomic-level granularity weighting of force data.

For FitSnapTrainer, the per_atom_weights flag can be supplied with data in list/np.ndarray form, a filepath to a np.loadtxt() compatible file, or a True boolean. A True boolean will attempt to locate data within the storage object datasets under the label atomic_weights. FitSNAP Neural Networks use a data loader structure that does not support individual atom weighting at the moment. All linear or quadratic type SNAP type models are compatible with per atom weighting.

For DUNNTrainer and ParametricModelTrainer the per_atom_weights currently only accepts a True/False boolean, and will attempt to locate the data within the storage object datasets under the label atomic_weights.

Additional Details

Additional details on the kliff trainers are on these pages:

Inheritance Graph

Inheritance diagram of orchestrator.trainer.chimes, orchestrator.trainer.factory, orchestrator.trainer.fitsnap, orchestrator.trainer.kliff.kliff, orchestrator.trainer.kliff.kliff_dunn_trainer, orchestrator.trainer.kliff.kliff_parametric_trainer