Descriptor Module¶
Abstract Base Class¶
- class orchestrator.computer.descriptor.descriptor_base.DescriptorBase(**kwargs)[source]¶
Bases:
ComputerAbstract base class for descriptor calculations
The descriptor class manages the construction and parsing of atomic descriptors to provide training or reference data. The input will consist of an atomic configuration and calculation parameters, and the output will be the descriptors corresponding to that configuration. These may be environment-level, configuration-level, or something else depending upon the implementation.
- OUTPUT_KEY = 'descriptors'¶
- atoms_file_name = 'atoms_with_descriptors.xyz'¶
- init_args_file_name = 'descriptor_init_args.json'¶
- init_args_subdir = 'descriptor_init_args_temp_files'¶
- compute_args_file_name = 'descriptor_compute_args.json'¶
- compute_args_subdir = 'descriptor_compute_args_temp_files'¶
- script_file_name = 'descriptor_compute_script.py'¶
- abstract compute(atoms, **kwargs)[source]¶
Runs the calculation for a single atomic configuration. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.
- Parameters:
atoms (Atoms) – the ASE Atoms object
- Returns:
(N, D) array of D-dimensional descriptors for all N atoms.
- Return type:
np.ndarray
- abstract compute_batch(list_of_atoms, **kwargs)[source]¶
Runs the calculation for a batch of atomic configurations. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.
- Parameters:
list_of_atoms (list) – a list of ASE Atoms objects
- Returns:
list of (N, D) arrays of D-dimensional descriptors corresponding to the descriptors of each atomic configuration
- Return type:
list
- get_run_command(**kwargs)[source]¶
Return the command to run calculations within a workflow. This allows for distributed execution of
compute().- Returns:
string for execution via command line
- Return type:
str
- get_batched_run_command(**kwargs)[source]¶
Similar to
get_run_command(), this function is meant to support executingcompute_batch()within a workflow.- Returns:
string for execution via command line
- Return type:
str
- run(path_type, compute_args, configs, workflow=None, job_details=None, batch_size=1, verbose=False)[source]¶
Main function to compute the descriptors for a collection of atomic configurations.
The run method includes half of the main functionality of the computer, taking atomic configurations as input and handling the submission of calculations to obtain the computed results. configs is a dataset of 1 or more structures. run() will create independent jobs for each batch of structures using the supplied workflow, with job_details parameterizing the job submission.
- Parameters:
path_type (str) – specifier for the workflow path, to differentiate calculation types
compute_args (dict) – input arguments to fill out the input file
configs (list) – list of configurations as ASE atoms to run ground truth calculations for
workflow (Workflow) – the workflow for managing job submission, if none are supplied, will use the default workflow defined in this class
Default:Nonejob_details (dict) – dict that includes any additional parameters for running the job (passed to
submit_job())Default:{}batch_size (bool) – number of configurations to pass to
compute()at once. Default of 1 does not do any batching.verbose (
Optional[bool]) – if True, show progress- Returns:
a list of calculation IDs from the workflow.
- Return type:
list
- write_input(run_path, compute_args, configs)[source]¶
Generate input files for running the calculation.
This method will write the requisite input files in the run_path. Specific implementations may leverage additional helper functions to construct the input. Notably, any arguments that are passed as in-memory arrays will be written out to temporary files, which will be removed later by .cleanup().
- Parameters:
run_path (str) – directory path where the file is written
compute_args (dict) – arguments for the computer
configs (list or Atoms) – the configurations as an Atoms objects.
- Returns:
name of written input file
- Return type:
str
- parse_for_storage(run_path, cleanup=True)[source]¶
Process calculation output as ASE Atoms, then clean up.
Use ASE’s read() function to parse the xyz file written by this module, then run cleanup() to remove any unnecessary temporary files.
- Parameters:
run_path (str) – directory where the output file resides
cleanup (bool) – a flag indicating whether to delete the temporary files.
Default:True- Returns:
Atoms of the configurations with attached properties and metadata
- Return type:
list of Atoms
- class orchestrator.computer.descriptor.descriptor_base.AtomCenteredDescriptor(**kwargs)[source]¶
Bases:
DescriptorBase- save_results(descriptors, save_dir='.', list_of_configs=None, **kwargs)[source]¶
Save descriptors to a file.
Since these results are per-atom descriptors, they will be saved in the .arrays dictionary of an Atoms object. Note that this code assumes that the ASE file used to compute the results already exists in save_dir.
- Parameters:
descriptors (np.ndarray or list[np.ndarray]) – the computed descriptors
list_of_configs (list or Atoms) – the atomic configurations for which the descriptors were computed. Must be provided so that descriptors can be attached and saved on the correct Atoms objects.
save_path (str) – folder in which to save the results
- class orchestrator.computer.descriptor.descriptor_base.ConfigurationDescriptor(**kwargs)[source]¶
Bases:
DescriptorBaseFor generating configuration-level descriptors.
- save_results(descriptors, save_dir='.', list_of_configs=None, **kwargs)[source]¶
Save descriptors to a file.
Since these results are configuration-level descriptors, they will be saved in the .info dictionary of an Atoms object. Note that this code assumes that the ASE file used to compute the results already exists in save_dir.
- Parameters:
descriptors (np.ndarray or list[np.ndarray]) – the computed descriptors
list_of_configs (list or Atoms) – the atomic configurations for which the descriptors were computed. Must be provided so that descriptors can be attached and saved on the correct Atoms objects.
save_path (str) – folder in which to save the results
Concrete Implementations¶
- class orchestrator.computer.descriptor.kliff.KLIFFDescriptor(descriptor_type, cut_dists, cut_name, hyperparams)[source]¶
Bases:
AtomCenteredDescriptorLeverages the KLIFF library and its built-in descriptors.
- supported_descriptor_types = ['symmetry_function', 'bispectrum']¶
- __init__(descriptor_type, cut_dists, cut_name, hyperparams)[source]¶
- Parameters:
descriptor_type (str) – the type of the descriptors to evaluate. See supported_descriptor_types for available options.
cut_dists (dict) – the cutoff distances for each element pairing. For example: {‘Cu-Cu’: 3.5}.
cut_name (str) – Name of the cutoff function, such as cos, P3, and P7.
hyperparams (dict or str) – A dictionary of the hyperparams of the descriptor or a string to select the predefined hyperparams.
- compute(atoms, **kwargs)[source]¶
Compute the atomic descriptors for a single supercell. See .compute_batch for arguments.
- Return type:
ndarray
- compute_batch(list_of_atoms, **kwargs)[source]¶
Computes atomic descriptors for all atomic configurations in the list.
- Parameters:
list_of_atoms (list of ASE.Atoms objects) – the list of atomic configurations for which to compute the atomic descriptors
- Returns:
list of descriptors for each atomic configuration from list_of_atoms
- Return type:
list
- get_colabfit_property_definition(name=None)[source]¶
A ‘property definition’ is a dictionary used by the ColabFit storage module for exactly specifying the details (data type, shape, description, etc.) of each field required for uniquely defining a given property. This function must be implemented in order to support storage of the computed results in the ColabFit module.
- Parameters:
name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.
- Returns:
the property definition
- Return type:
dict
- get_colabfit_property_map(name=None)[source]¶
Returns a default property map that can be used to extract a ColabFit property from an ASE.Atoms object. This assumes that the values being extracted are stored in their default locations based on the specific Computer module (usually within the compute() or compute_batch() functions).
A ‘property map’ is similar to a ‘property definition’, but instead tells ColabFit how to extract the keys specified in the property definition from an ASE.Atoms object. This function must be implemented in order to support storage of the computed results in the ColabFit module.
- Parameters:
name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.
- Returns:
the property map
- Return type:
dict
- class orchestrator.computer.descriptor.quests.QUESTSDescriptor(num_nearest_neighbors=32, cutoff=5.0, species=None)[source]¶
Bases:
AtomCenteredDescriptorLeverages the QUESTS library for model agnostic descriptors.
- __init__(num_nearest_neighbors=32, cutoff=5.0, species=None)[source]¶
- Parameters:
num_nearest_neighbors (int) – the number of nearest neighbors considered in calculation. Determines the dimensionality of the quests descriptor: (2*num_nearest_neighbors)-1
cutoff (float) – the distance in angstroms considered in calculation
species (list[str]) – the species list. If provided, all species-species interactions are computed and concatenated in order. If not provided, the species-agnostic version is used.
- compute(atoms, **kwargs)[source]¶
Computes the QUESTS descriptors for one configuration of atoms.
- Parameters:
atoms (ASE.Atoms object) – the atomic structure to compute descriptors for
- Return type:
ndarray- Returns:
(N,D) array of D-dimensional QUESTS descriptors corresponding to the N atoms in the atomic configuration where D equals (2*num_nearest_neighbors)-1
- compute_batch(list_of_atoms, **kwargs)[source]¶
Computes the QUESTS descriptors for all configurations in the list.
- Parameters:
list_of_atoms (list of ASE.Atoms objects) – atomic structures to compute descriptors
- Returns:
list of (N, D) arrays of D-dimensional QUESTS descriptors corresponding to the descriptors of each atomic configuration of N atoms, where D equals (2*num_nearest_neighbors)-1
- Return type:
list
- get_colabfit_property_definition(name=None)[source]¶
A ‘property definition’ is a dictionary used by the ColabFit storage module for exactly specifying the details (data type, shape, description, etc.) of each field required for uniquely defining a given property. This function must be implemented in order to support storage of the computed results in the ColabFit module.
- Parameters:
name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.
- Returns:
the property definition
- Return type:
dict
- get_colabfit_property_map(name=None)[source]¶
Returns a default property map that can be used to extract a ColabFit property from an ASE.Atoms object. This assumes that the values being extracted are stored in their default locations based on the specific Computer module (usually within the compute() or compute_batch() functions).
A ‘property map’ is similar to a ‘property definition’, but instead tells ColabFit how to extract the keys specified in the property definition from an ASE.Atoms object. This function must be implemented in order to support storage of the computed results in the ColabFit module.
- Parameters:
name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.
- Returns:
the property map
- Return type:
dict