Computer Module

Abstract Base Class

class orchestrator.computer.computer_base.Computer(**kwargs)[source]

Bases: Recorder, ABC

Abstract base class for the computer.

OUTPUT_KEY = None
__init__(**kwargs)[source]

Initialize the Recorder mixin class.

Sets up logging configuration and creates a logger instance named after the class of the object using it.

Parameters:
  • args – Positional arguments passed to other supers in the MRO.

  • kwargs – Keyword arguments passed to other supers in the MRO.

compute(atoms, **kwargs)[source]

Runs the calculation for a single atomic configuration. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.

Parameters:

atoms (Atoms) – the ASE Atoms object

Return type:

ndarray

Returns:

some value; depends upon sub-class

compute_batch(list_of_atoms, **kwargs)[source]

Runs the calculation for a batch of atomic configurations. This is intended to be able to be used in a serial (non-distributed) manner, outside of a proper orchestrator workflow.

Parameters:
  • list_of_atoms (list) – a list of ASE Atoms objects

  • args (dict) – any additional arguments to be passed to calculator method

Returns:

a list of values equivalent to [self.compute(atoms, args) for atoms in list_of_atoms]

Return type:

list

abstract get_run_command(**kwargs)[source]

Return the command to run calculations within a workflow. This allows for distributed execution of compute().

This method formats the run command, while the args dictionary can be used to pass any necessary extra parameters to the specific implementations.

Returns:

implementation dependent

Return type:

implementation dependent

abstract get_batched_run_command(**kwargs)[source]

Similar to get_run_command(), this function is meant to support executing compute_batched() within a workflow.

Returns:

implementation dependent

Return type:

implementation dependent

abstract run(path_type, workflow=None)[source]

Executes the calculation across a provided workflow. Note that sub-classes may have implementations with additional arguments.

Parameters:
  • path_type (str) – specifier for the workflow path, to differentiate calculation types.

  • workflow (Workflow) – the workflow for managing job submission, if none are supplied, will use the default workflow defined in this class

    Default: None

Returns:

a list of calculation IDs from the workflow.

Return type:

list

save_labeled_configs(data_pointers, storage=None, dataset_name=None, dataset_handle=None, workflow=None, cleanup=True)[source]

Extract and save computed data to storage.

Once the calculations are complete, the data they generate must be integrated with the structural configuration in a consistent framework to be used for training. This is done by parsing and ingesting the configuration and attached data into a dataset handled by the Storage module.

Parameters:
  • data_pointers (list (of Atoms or int or str)) – configs or calc_ids or explicit paths associated with each config. If calc_ids or explicit paths are supplied, they should point to ASE-readable files from which to load the Atoms objects. If calc_ids are supplied, the path is extracted from the JobStatus. Calc IDs are generally prefered as they can also carry metadata with them.

  • storage (Storage) – specific module that handles the staroge of data.

    Default: None

  • dataset_name (str) – Name of the dataset in the database. If None, then the class default (date stamped) is used.

    Default: None

  • dataset_handle (str) – the handle to identify where in Storage the configurations should be saved.

  • workflow (Workflow) – the workflow for managing job submission, if none are supplied, will use the default workflow defined in this class. Should be consistent with the workflow supplied for the run calls.

    Default: None

  • cleanup (bool) – a flag indicating whether to delete the temporary files.

    Default: True

Returns:

dataset handle

Return type:

str

abstract write_input(run_path, input_args)[source]

Writes any input data necessary for the calculation to the run path

abstract parse_for_storage(run_path, cleanup)[source]

Process calculation output to extract data in a consistent format

Parameters:
  • run_path (str) – directory where the output resides

  • cleanup (bool) – a flag indicating whether to delete the temporary files.

    Default: True

Returns:

depends upon implementation

Return type:

depends upon implementation, but should always be a list

abstract save_results(compute_results, save_dir, **kwargs)[source]

Save calculation output to a file. Implementation dependent.

Note that this function should also store any metadata associated with the calculation.

Parameters:
  • compute_results (np.ndarray or list[np.ndarray]) – the output of .compute() or .compute_batch()

  • save_path (str) – folder in which to save the results

cleanup(run_path=None)[source]

Removes any temporary files that were created for job execution.

Parameters:

run_path (str) – the parent directory containing the temp file subdir. If None, it is not being called by a batch job, so it should delete the init_args

data_from_calc_ids(data_pointers, workflow=None, cleanup=True)[source]

Return the parsed data from a list of calculation IDs.

Parameters:
  • data_pointers (list) – list of calc_ids for extracting to computed results

  • workflow (Workflow) – the workflow for managing job submission, if none are supplied, will use the default workflow defined in this class

    Default: None

  • cleanup (bool) – a flag indicating whether to delete the temporary files.

    Default: True

Returns:

a list of the computed values

Return type:

list

get_colabfit_property_definition(name=None)[source]

A ‘property definition’ is a dictionary used by the ColabFit storage module for exactly specifying the details (data type, shape, description, etc.) of each field required for uniquely defining a given property. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property definition

Return type:

dict

get_colabfit_property_map(name=None)[source]

Returns a default property map that can be used to extract a ColabFit property from an ASE.Atoms object. This assumes that the values being extracted are stored in their default locations based on the specific Computer module (usually within the compute() or compute_batch() functions).

A ‘property map’ is similar to a ‘property definition’, but instead tells ColabFit how to extract the keys specified in the property definition from an ASE.Atoms object. This function must be implemented in order to support storage of the computed results in the ColabFit module.

Parameters:

name (str) – the name of the property. Only needs to be provided if the Computer can return multiple properties.

Returns:

the property map

Return type:

dict