3. Your first workchain¶
Here we will continue where the previous tutorial left of. In particular we will now create a dedicated workchain that calls the VASP workchain and thus eventually executes VASP for each volume. The workchain makes sure the total energies are extracted and stored. In the end we will even use some Python functions to locate the minimum and store that as well. We hope that you after this tutorial will finally start to appreciate the convenience of AiiDA-VASP and maybe also start to get a feel for what is possible. Even better, when performing calculations like this, especially when the workchain stack becomes much more complex than showed in this example, the workflow becomes standardized and versioned, so it is easier to share and collaborate. Also, reproducibility is finally possible in practice. Let us start to develop the workchain. As before we assume you have completed the steps of the previous tutorial.
In order to make it as easy as possible, we have created a cookie cutter recipe for creating
AiiDA-VASP workchains. This is located in cookiecutters/workchain
on the root of the
repository. If you have not cloned it, you can download only the cookiecutter folder using:
$ svn checkout https://github.com/aiida-vasp/aiida-vasp/trunk/cookiecutters
What we want to do is to create a workchain that calculates some equation of state (EOS), in this
case over volume. For simplicity we will call the workchain EosWorkChain
. It is always recommended to
dissect the operations we will perform into parts before writing the workchain. In this way, it
is easier to reuse larger parts of the workchain at a later point. Also, it is easier
to understand what is going on from just inspecting the definitions of the inputs, steps and outputs
of the workchain. Please, have a look at Workchains and references therein before continuing.
Also, consider to look into the tutorial on writing workflows in the AiiDA documentation before
continuing.
First, let us Make sure you have
cookiecutter
installed:$ pip install cookiecutter
Now we simply execute the
cookiecutter
program using theworkchains
folder as the template:$ cookiecutter cookiecutters/workchains workchain_name [testchain]: eos workchain_docstring [This workchain is a testchain]: This workchain will accept a dictionary of structures and extract the total energies for each structure. The data is saved and the energy minimum is calculated and stored. next_workchain_to_call [vasp.vasp]:
A new folder
eos
was created which contains theeos.py
workchain file. Inspect it. Maybe the most important part of a workchain is thedefine
function, which, as its name implies, defines the workchain. For this particular case it looks like:def define(cls, spec): super(EosWorkChain, cls).define(spec) spec.expose_inputs(cls._next_workchain, exclude=['structure']) spec.input_namespace( 'structures', valid_type=DataFactory('structure'), dynamic=True, help='a dictionary of structures to use' ) spec.exit_code(0, 'NO_ERROR', message='the sun is shining') spec.exit_code(420, 'ERROR_NO_CALLED_WORKCHAIN', message='no called workchain detected') spec.exit_code(500, 'ERROR_UNKNOWN', message='unknown error detected in the eos workchain') spec.outline( cls.initialize, while_(cls.run_next_workchains)( cls.init_next_workchain, cls.run_next_workchain, cls.verify_next_workchain, cls.extract_quantities ), cls.finalize ) # yapf: disable spec.output('quantities', valid_type=DataFactory('array'), help='a container for various quantities') spec.expose_outputs(cls._next_workchain)
We have defined some inputs:
All inputs defined in the
_next_workchain
, which in this case is the VASP workchain are also available for this workchain by usingexpose_inputs
. You can of course expose inputs from other workchains as long as AiiDA can find them. In this way, we do not have to repeat the definition of more basic workchains that can easily be repeated, like the VASP workchain.A namespace called
structures
, which is defined as a dictionary. Since we here do not know how many structures will be supplied, we have defined this input as a dynamic namespace. However, we know the dictionary should only containStructureData
values.
Then we have defined some outputs:
We attach all outputs from
_next_workchain
, i.e. the VASP workchain to this workchain (basically a link is created to avoid storing things twice). Similar to the inputs,expose_outputs
can be leveraged.
And some generic exit codes.
Finally, there is the
outline
section which tells the workchain which functions to run. In this case we have:A function that initializes what we need,
initialize
.A while loop that iterates over all the structures.
A
init_next_workchain
function that initializes what we need for the next workchain at the current iteration (here we make sure the specific structure is set).A
run_next_workchain
which basically executes VASP workchain.A
verify_next_workchain
which verifies that there is a valid VASP workchain and inherits any exit codes present from VASP workchain.A
extract_results
which gets what we need from the outputs.A
finalize
which stores the results.
Notice that the
cookiecutter
gave all this automatically, which is rather useful as a starting point to develop new workchains.The workchain is however not yet ready. We need to extract the total energies and specify the general workchain further. Please make the following changes to the generated workchain:
--- /home/docs/checkouts/readthedocs.org/user_builds/aiida-vasp/checkouts/latest/tutorials/eos_base.py +++ /home/docs/checkouts/readthedocs.org/user_builds/aiida-vasp/checkouts/latest/tutorials/eos.py @@ -11,6 +11,8 @@ import random import numpy as np +from scipy.interpolate import interp1d +from scipy.optimize import minimize from aiida.common.extendeddicts import AttributeDict from aiida.engine import WorkChain, append_, calcfunction, while_ @@ -51,13 +53,19 @@ cls.init_next_workchain, cls.run_next_workchain, cls.verify_next_workchain, - cls.extract_quantities + cls.extract_volume_and_energy ), cls.finalize ) # yapf: disable - spec.output('quantities', valid_type=DataFactory('array'), help='a container for various quantities') - spec.expose_outputs(cls._next_workchain) + spec.output( + 'eos', valid_type=DataFactory('array'), help='a list containing the cell volumes and total energies' + ) + spec.output( + 'eos_minimum', + valid_type=DataFactory('dict'), + help='a dictionary containing the cell volume at energy minimum' + ) def initialize(self): """Initialize the eos workchain.""" @@ -82,8 +90,8 @@ # Define the context inputs self.ctx.inputs = AttributeDict() - # Define container to store quantities that is extracted in each step - self.ctx.quantities_container = [] + # Define the total energies list + self.ctx.total_energies = [] def _init_inputs(self): """Initialize inputs.""" @@ -125,7 +133,7 @@ self.ctx.inputs.structure = self.ctx.structures.pop(item) # Make sure we do not have any floating dict (convert to Dict etc.) - self.ctx.inputs = prepare_process_inputs(self.ctx.inputs) + self.ctx.inputs = prepare_process_inputs(self.ctx.inputs, namespaces=['dynamics']) def run_next_workchain(self): """ @@ -170,35 +178,74 @@ return self.ctx.exit_code - def extract_quantities(self): - """Extract the quantities you want and store them in the container.""" - - # What happens in this routine needs modifications by you. - # Typically you get the output of the called workchain by doing - # workchain = self.ctx.workchains[-1] - # some_output_quantity = workchain.outputs.some_output_quantity - - # An example which stores nonsense. - self.ctx.quantities_container.append([self.ctx.iteration, f'Some quantity for iteration: {self.ctx.iteration}']) + def extract_volume_and_energy(self): + """Extract the cell volume and total energy for this structure.""" + + workchain = self.ctx.workchains[-1] + + # Fetch the total energy + misc = workchain.outputs.misc.get_dict() + total_energy = misc['total_energies']['energy_extrapolated'] + + # Fetch the volume + volume = self.ctx.inputs.structure.get_cell_volume() + + # Store both in a list + self.ctx.total_energies.append([volume, total_energy]) def finalize(self): """ Finalize the workchain. - Take the quantity container and set it as an output of this workchain. + Take the total energies container and set is as an output of this workchain. """ # Due to data provenance we cannot return AiiDA data containers that have # not been passed through a calcfunction, workfunction or a workchain. Create this now. - quantities_container = store_quantities(DataFactory('list')(list=self.ctx.quantities_container)) + total_energies = store_total_energies(DataFactory('list')(list=self.ctx.total_energies)) + + # Let us also try to find a better minimum, just as an example using the power of Python + energy = locate_minimum(total_energies) # And then store the output on the workchain - self.out('quantities', quantities_container) + self.out('eos', total_energies) + self.out('eos_minimum', energy) @calcfunction -def store_quantities(quantities_container): - """Stores the quantities to keep data provenance.""" - quantities_container_list = quantities_container.get_list() - quantities_container_array = DataFactory('array')() - quantities_container_array.set_array('quantities', np.array(quantities_container_list)) - return quantities_container_array +def store_total_energies(total_energies): + """Stores the total energies in ArrayData to keep data provenance.""" + total_energies_list = total_energies.get_list() + # Let us also sort by volume as we picked the entries in the structures by random + # above + total_energies_array = np.array(total_energies_list) + total_energies_array_sorted = total_energies_array[total_energies_array[:, 0].argsort()] + array_data = DataFactory('array')() + array_data.set_array('eos', total_energies_array_sorted) + + return array_data + + +@calcfunction +def locate_minimum(total_energies): + """Locate the volume with the lowest energy using interpolation.""" + total_energies_array = total_energies.get_array('eos') + volumes = total_energies_array[:, 0] + energies = total_energies_array[:, 1] + + # Establish some initial guess (take lowest energy point from the original dataset) + min_energy_guess_index = np.argmin(energies) + + # Create the function that can be used to extract interpolated values + # Using cubic interpolation here, which is not necessarly physically correct, + # only serves as an example. Please have a look at the theory behind more realiastic + # models that can be used to fit such data. + new_energies = interp1d(volumes, energies, kind='cubic') + + # Use minimize from scipy to locate the minimal point that can be ejected by the + # interpolation routines + minimized_point = minimize(new_energies, volumes[min_energy_guess_index], tol=1e-3) + + # Create a dictionary to house the results and return + dict_data = DataFactory('dict')(dict={'volume': minimized_point.x[0], 'energy': minimized_point.fun}) + + return dict_data
and save it as
eos.py
. Or you could also download it:$ wget https://github.com/aiida-vasp/aiida-vasp/raw/master/tutorials/eos.py
The majority of changes were related to being more specific, except two things:
The necessity of decorating the function that generates the output array containing the volume and total energies in a
calcfunction
. This is to preserve data provenance, otherwise we would not have known how the data was collected from each of the underlying workchains.The inclusion of a
calcfunction
that interpolates the calculated data to find a better estimate of the volume at the energy minima. The example uses a cubic fit, which is certainly not very physical and should not be used in production. It is only to show how simply it is to leverage the power of Python, NumPy, SciPy and AiiDA. This was decorated with acalcfunction
in order to make sure AiiDA can honor data provenance.
Next, download the launch script that is tailored to launch the workchain we have now developed:
$ wget https://github.com/aiida-vasp/aiida-vasp/raw/master/tutorials/run_fcc_si_workchain.py
Change the
options
andcode_string
as you did in previously.Now we need to make sure the daemon can pick up the workchain. We can do this by making sure the daemon sees the directory where
eos.py
andrun_fcc_si_workchain.py
is located. The simplest approach is to add the following, to your virtual environmentactivate
script (assuming you do not use Conda):$ echo "export PYTHONPATH=$PYTHONPATH:<yourdirectory>" >> ~/env/aiida-vasp/bin/activate
assuming
<yourdirectory>
is the directory containing theeos.py
andrun_fcc_si_workchain.py
files. The location of theactivate
is assumed from the previous steps in the tutorial. If you use Conda, please do:$ echo "export PYTHONPATH=$PYTHONPATH:<yourdirectory>" >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
Warning
Make sure you have (re)activated your AiiDA virtual environment and (re)started the AiiDA daemon before continuing. Otherwise, the updated
PYTHONPATH
variable will not be picked up by the daemon worker. You may want to double check by runningecho $PYTHONPATH
in the shell before starting the daemon withverdi daemon start
.Submit the workchain by running the call script:
$ python run_fcc_si_workchain.py
After a while we check the status:
$ verdi process list -a PK Created Process label Process State Process status ---- --------- -------------------- ----------------- ----------------------------------------------------------- 1721 22m ago EosWorkChain ⏹ Finished [0] 1722 22m ago VaspWorkChain ⏹ Finished [0] 1724 22m ago VaspCalculation ⏹ Finished [0] 1728 19m ago VaspWorkChain ⏹ Finished [0] 1730 19m ago VaspCalculation ⏹ Finished [0] 1734 17m ago VaspWorkChain ⏹ Finished [0] 1736 17m ago VaspCalculation ⏹ Finished [0] 1740 15m ago VaspWorkChain ⏹ Finished [0] 1742 15m ago VaspCalculation ⏹ Finished [0] 1746 13m ago VaspWorkChain ⏹ Finished [0] 1748 13m ago VaspCalculation ⏹ Finished [0] 1752 10m ago VaspWorkChain ⏹ Finished [0] 1754 10m ago VaspCalculation ⏹ Finished [0] 1758 8m ago VaspWorkChain ⏹ Finished [0] 1760 8m ago VaspCalculation ⏹ Finished [0] 1764 6m ago VaspWorkChain ⏹ Finished [0] 1766 6m ago VaspCalculation ⏹ Finished [0] 1770 4m ago VaspWorkChain ⏹ Finished [0] 1772 3m ago VaspCalculation ⏹ Finished [0] 1777 1m ago store_total_energies ⏹ Finished [0] 1779 1m ago locate_minimum ⏹ Finished [0] Total results: 165 Report: last time an entry changed state: 1m ago (at 17:18:23 on 2022-12-21) Report: Using 2% of the available daemon worker slots.
As you can see, seven VASP workchain and VASP calculation were executed, one for each supplied volume. Also, there is a separate entry for the storage of the total energies, which also performs a sort. The location of the minima is also listed as a separate process as we decorated that function with a
calcfunction
decorator.Let us have a look at the output of
EosWorkChain
:$ verdi process show 1721 Property Value ----------- ------------------------------------ type EosWorkChain state Finished [0] pk 1721 uuid 69e9a920-c783-4f61-ac73-349b6e19059d label description ctime 2022-12-21 16:58:07.145071+01:00 mtime 2022-12-21 17:18:23.379756+01:00 Inputs PK Type ------------------ ---- ------------- structures silicon_at_3_5 1703 StructureData silicon_at_3_6 1704 StructureData silicon_at_3_7 1705 StructureData silicon_at_3_8 1706 StructureData silicon_at_3_9 1707 StructureData silicon_at_4_0 1708 StructureData silicon_at_4_1 1709 StructureData silicon_at_4_2 1710 StructureData silicon_at_4_3 1711 StructureData clean_workdir 1720 Bool code 818 InstalledCode kpoints 1712 KpointsData max_iterations 1719 Int options 1716 Dict parameters 1713 Dict potential_family 1714 Str potential_mapping 1715 Dict settings 1717 Dict verbose 1718 Bool Outputs PK Type ----------- ---- --------- eos 1778 ArrayData eos_minimum 1780 Dict Called PK Type -------- ---- -------------------- CALL 1722 VaspWorkChain CALL 1728 VaspWorkChain CALL 1734 VaspWorkChain CALL 1740 VaspWorkChain CALL 1746 VaspWorkChain CALL 1752 VaspWorkChain CALL 1758 VaspWorkChain CALL 1764 VaspWorkChain CALL 1770 VaspWorkChain CALL 1777 store_total_energies CALL 1779 locate_minimum Log messages --------------------------------------------- There are 9 log messages for this calculation Run 'verdi process report 1721' to see them
Inspect the total energies versus volume:
$ verdi data core.array show 1777 { "eos": [ [ 10.71875, -4.42341939 ], [ 11.664, -4.66006381 ], [ 12.66325, -4.79595554 ], [ 13.718, -4.86303429 ], [ 14.82975, -4.87588353 ], [ 16.0, -4.8481407 ], [ 17.23025, -4.78451926 ], [ 18.522, -4.69228837 ], [ 19.87675, -4.58122058 ] ] }
And the located minimum:
$ verdi data core.dict show 1780 { "energy": -4.8769540208841, "volume": 14.559367617229 }
That concludes this tutorial. We hope at this point you have now realized that AiiDA-VASP seems somewhat useful and that you would like to continue to learn more, maybe even start to write your own Workflows or Workchains. You might have noticed when running this workflow that the each volume was running sequentially and was a bit concerned about that being not so efficient as there is no data sharing between the different volume runs. And indeed you are right. The next tutorial will show how this can be addressed.