Descriptor Module

Abstract Base Class

class orchestrator.computer.descriptor.descriptor_base.DescriptorBase(**kwargs)[source]

Bases: Computer

Abstract base class for descriptor calculations

The descriptor class manages the construction and parsing of atomic descriptors to provide training or reference data. The input will consist of an atomic configuration and calculation parameters, and the output will be the descriptors corresponding to that configuration. These may be environment-level, configuration-level, or something else depending upon the implementation.

OUTPUT_KEY = 'descriptors'
atoms_file_name = 'atoms_with_descriptors.xyz'
init_args_file_name = 'descriptor_init_args.json'
init_args_subdir = 'descriptor_init_args_temp_files'
compute_args_file_name = 'descriptor_compute_args.json'
compute_args_subdir = 'descriptor_compute_args_temp_files'
script_file_name = 'descriptor_compute_script.py'
abstract compute(atoms, **kwargs)[source]

Runs the calculation for a single atomic configuration. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.

Parameters:

atoms (Atoms) – the ASE Atoms object

Returns:

(N, D) array of D-dimensional descriptors for all N atoms.

Return type:

np.ndarray

abstract compute_batch(list_of_atoms, **kwargs)[source]

Runs the calculation for a batch of atomic configurations. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.

Parameters:

list_of_atoms (list) – a list of ASE Atoms objects

Returns:

list of (N, D) arrays of D-dimensional descriptors corresponding to the descriptors of each atomic configuration

Return type:

list

get_run_command(**kwargs)[source]

Return the command to run calculations within a workflow. This allows for distributed execution of compute().

Returns:

string for execution via command line

Return type:

str

get_batched_run_command(**kwargs)[source]

Similar to get_run_command(), this function is meant to support executing compute_batch() within a workflow.

Returns:

string for execution via command line

Return type:

str

run(path_type, compute_args, configs, workflow=None, job_details=None, batch_size=1, verbose=False)[source]

Main function to compute the descriptors for a collection of atomic configurations.

The run method includes half of the main functionality of the computer, taking atomic configurations as input and handling the submission of calculations to obtain the computed results. configs is a dataset of 1 or more structures. run() will create independent jobs for each batch of structures using the supplied workflow, with job_details parameterizing the job submission.

Parameters:
  • path_type (str) – specifier for the workflow path, to differentiate calculation types

  • compute_args (dict) – input arguments to fill out the input file

  • configs (list) – list of configurations as ASE atoms to run ground truth calculations for

  • workflow (Workflow) – the workflow for managing job submission, if none are supplied, will use the default workflow defined in this class

    Default: None

  • job_details (dict) – dict that includes any additional parameters for running the job (passed to submit_job())

    Default: {}

  • batch_size (bool) – number of configurations to pass to compute() at once. Default of 1 does not do any batching.

  • verbose (Optional[bool]) – if True, show progress

Returns:

a list of calculation IDs from the workflow.

Return type:

list

write_input(run_path, compute_args, configs)[source]

Generate input files for running the calculation.

This method will write the requisite input files in the run_path. Specific implementations may leverage additional helper functions to construct the input. Notably, any arguments that are passed as in-memory arrays will be written out to temporary files, which will be removed later by .cleanup().

Parameters:
  • run_path (str) – directory path where the file is written

  • compute_args (dict) – arguments for the computer

  • configs (list or Atoms) – the configurations as an Atoms objects.

Returns:

name of written input file

Return type:

str

parse_for_storage(run_path, cleanup=True)[source]

Process calculation output as ASE Atoms, then clean up.

Use ASE’s read() function to parse the xyz file written by this module, then run cleanup() to remove any unnecessary temporary files.

Parameters:
  • run_path (str) – directory where the output file resides

  • cleanup (bool) – a flag indicating whether to delete the temporary files.

    Default: True

Returns:

Atoms of the configurations with attached properties and metadata

Return type:

list of Atoms

class orchestrator.computer.descriptor.descriptor_base.AtomCenteredDescriptor(**kwargs)[source]

Bases: DescriptorBase

save_results(descriptors, save_dir='.', list_of_configs=None, **kwargs)[source]

Save descriptors to a file.

Since these results are per-atom descriptors, they will be saved in the .arrays dictionary of an Atoms object. Note that this code assumes that the ASE file used to compute the results already exists in save_dir.

Parameters:
  • descriptors (np.ndarray or list[np.ndarray]) – the computed descriptors

  • list_of_configs (list or Atoms) – the atomic configurations for which the descriptors were computed. Must be provided so that descriptors can be attached and saved on the correct Atoms objects.

  • save_path (str) – folder in which to save the results

class orchestrator.computer.descriptor.descriptor_base.ConfigurationDescriptor(**kwargs)[source]

Bases: DescriptorBase

For generating configuration-level descriptors.

save_results(descriptors, save_dir='.', list_of_configs=None, **kwargs)[source]

Save descriptors to a file.

Since these results are configuration-level descriptors, they will be saved in the .info dictionary of an Atoms object. Note that this code assumes that the ASE file used to compute the results already exists in save_dir.

Parameters:
  • descriptors (np.ndarray or list[np.ndarray]) – the computed descriptors

  • list_of_configs (list or Atoms) – the atomic configurations for which the descriptors were computed. Must be provided so that descriptors can be attached and saved on the correct Atoms objects.

  • save_path (str) – folder in which to save the results

Concrete Implementations

class orchestrator.computer.descriptor.kliff.KLIFFDescriptor(descriptor_type, cut_dists, cut_name, hyperparams)[source]

Bases: AtomCenteredDescriptor

Leverages the KLIFF library and its built-in descriptors.

supported_descriptor_types = ['symmetry_function', 'bispectrum']
__init__(descriptor_type, cut_dists, cut_name, hyperparams)[source]
Parameters:
  • descriptor_type (str) – the type of the descriptors to evaluate. See supported_descriptor_types for available options.

  • cut_dists (dict) – the cutoff distances for each element pairing. For example: {‘Cu-Cu’: 3.5}.

  • cut_name (str) – Name of the cutoff function, such as cos, P3, and P7.

  • hyperparams (dict or str) – A dictionary of the hyperparams of the descriptor or a string to select the predefined hyperparams.

compute(atoms, **kwargs)[source]

Compute the atomic descriptors for a single supercell. See .compute_batch for arguments.

Return type:

ndarray

compute_batch(list_of_atoms, **kwargs)[source]

Computes atomic descriptors for all atomic configurations in the list.

Parameters:

list_of_atoms (list of ASE.Atoms objects) – the list of atomic configurations for which to compute the atomic descriptors

Returns:

list of descriptors for each atomic configuration from list_of_atoms

Return type:

list

get_colabfit_property_definition(name=None)[source]

A ‘property definition’ is a dictionary used by the ColabFit storage module for exactly specifying the details (data type, shape, description, etc.) of each field required for uniquely defining a given property. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property definition

Return type:

dict

get_colabfit_property_map(name=None)[source]

Returns a default property map that can be used to extract a ColabFit property from an ASE.Atoms object. This assumes that the values being extracted are stored in their default locations based on the specific Computer module (usually within the compute() or compute_batch() functions).

A ‘property map’ is similar to a ‘property definition’, but instead tells ColabFit how to extract the keys specified in the property definition from an ASE.Atoms object. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property map

Return type:

dict

class orchestrator.computer.descriptor.quests.QUESTSDescriptor(num_nearest_neighbors=32, cutoff=5.0, species=None)[source]

Bases: AtomCenteredDescriptor

Leverages the QUESTS library for model agnostic descriptors.

__init__(num_nearest_neighbors=32, cutoff=5.0, species=None)[source]
Parameters:
  • num_nearest_neighbors (int) – the number of nearest neighbors considered in calculation. Determines the dimensionality of the quests descriptor: (2*num_nearest_neighbors)-1

  • cutoff (float) – the distance in angstroms considered in calculation

  • species (list[str]) – the species list. If provided, all species-species interactions are computed and concatenated in order. If not provided, the species-agnostic version is used.

compute(atoms, **kwargs)[source]

Computes the QUESTS descriptors for one configuration of atoms.

Parameters:

atoms (ASE.Atoms object) – the atomic structure to compute descriptors for

Return type:

ndarray

Returns:

(N,D) array of D-dimensional QUESTS descriptors corresponding to the N atoms in the atomic configuration where D equals (2*num_nearest_neighbors)-1

compute_batch(list_of_atoms, **kwargs)[source]

Computes the QUESTS descriptors for all configurations in the list.

Parameters:

list_of_atoms (list of ASE.Atoms objects) – atomic structures to compute descriptors

Returns:

list of (N, D) arrays of D-dimensional QUESTS descriptors corresponding to the descriptors of each atomic configuration of N atoms, where D equals (2*num_nearest_neighbors)-1

Return type:

list

get_colabfit_property_definition(name=None)[source]

A ‘property definition’ is a dictionary used by the ColabFit storage module for exactly specifying the details (data type, shape, description, etc.) of each field required for uniquely defining a given property. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property definition

Return type:

dict

get_colabfit_property_map(name=None)[source]

Returns a default property map that can be used to extract a ColabFit property from an ASE.Atoms object. This assumes that the values being extracted are stored in their default locations based on the specific Computer module (usually within the compute() or compute_batch() functions).

A ‘property map’ is similar to a ‘property definition’, but instead tells ColabFit how to extract the keys specified in the property definition from an ASE.Atoms object. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property map

Return type:

dict