6. Designing a workflow¶
This tutorial focuses on how to approach designing, developing, and finally launching a workflow.
We will use as an example, the calculation of the bulk modulus of wurtzite-type SiC.
The tutorial is divided in two sections. In the first part we will prepare the mental state by identifying the different steps we need to perform in order to calculate the bulk modulus using VASP. While, in the second part we will make a simple script that depend on AiiDA and AiiDA-VASP to perform the same calculation.
Also, note that we will in this tutorial try to motivate the reader to write most of this and not just downloading the scripts and executing them. This is good training practice. However, in case you need or want to there is also a link to the scripts used in this tutorial. But please try not to be tempted to download them right away. For this reason, they are also mentioned at the end.
1. Flow of work¶
Let us first start by trying to sketch the outline of the steps needed to investigate a property, phenomena or something else. These steps typically then define the flow of work to be performed, or the workflow.
Here we take a rather simple example to illustrate how to approach this problem, such that it becomes easier to write up the workflow in AiiDA later. Maybe a good mental picture of this would be that we want to bridge the gap between the two concepts of scientific workflows and the workflows as code. The first being how you would approach a problem from the scientific side, which is obviously not something you can directly do on a computer, while the second is how it can be performed in a computer.
To obtain the bulk modulus we typically need to perform several different calculations. Broadly, we would like to follow the following steps:
First we need to prepare the inputs, or starting point. In this case the crystal structure is the most important. Here, this would be the wurtzite-type SiC.
Then we need to Relax the crystal structure.
Then we need to Wait until (2) finishes and verify results.
Then we need to Create two structures at fixed volumes with +/- 1% from the relaxed structure obtained at the step (3).
Then we need to Relax the shape of the structures generated in step (4).
Then we need to Wait until (5) finishes and verify results.
Then we need to Compute bulk modulus as a post process by \(K \simeq -V_0\frac{\Delta P}{\Delta V}\), where we use the previous outputs as inputs.
Let us now try to perform these steps exclusively using VASP. The steps above are of course general and could be executed on any code that provides the functionality to honor the sub-steps.
2. Manual calculation¶
1. Preparing input files¶
We will utilize the following crystal structure for wurtzite-type SiC, for VASP this means using the following
POSCAR
file:wurtzite-type SiC 1.0000000000 3.0920000000 0.0000000000 0.0000000000 -1.5460000000 2.6777505485 0.0000000000 0.0000000000 0.0000000000 5.0730000000 Si C 2 2 Direct 0.3333333333 0.6666666667 0.0000000000 0.6666666667 0.3333333333 0.5000000000 0.3333333333 0.6666666667 0.3758220000 0.6666666667 0.3333333333 0.8758220000
We also need to prepare a suitable
KPOINTS
file, here:# Half grid shift along c* 0 Gamma 6 6 4 0.000000000 0.000000000 0.500000000
Then, we need the
INCAR
file to tell VASP to relax the structure:EDIFF = 1e-08 EDIFFG = -1e-05 ENCUT = 500 IALGO = 38 IBRION = 2 ISIF = 3 ISMEAR = 0 LCHARG = .FALSE. LREAL = .FALSE. LWAVE = .FALSE. NELM = 100 NSW = 100 NELMIN = 5 PREC = Accurate SIGMA = 0.01
You also need to assemble the potential files for
SiC
by concatenating the recommendedPBE.54
potential files forSi
andC
into a suitablePOTCAR
. If in doubt about these steps, please have a look at VASP lectures, VASP tutorials, VASP howtos, VASP tutorials using notebooks or VASP videos or ask experienced VASP users. AiiDA-VASP is not a substitute for not knowing how to use VASP, its intent is to make it more efficient, reproducible to use VASP, in addition to open up for new areas of applications.Create a jobfile, or another way to execute VASP in your environment and continue to the next step.
2. Perform the relaxation and 3. Wait for it to finish¶
Execute VASP using the prepared
INCAR
,POSCAR
,KPOINTS
andPOTCAR
from the previous step.Inspect the resulting relaxed structure that will be housed in the created
CONTCAR
file. Its content should be something like:wurtzite-type SiC 1.00000000000000 3.0920838394862775 0.0000000000000000 0.0000000000000000 -1.5460419197431388 2.6778231556249570 -0.0000000000000000 -0.0000000000000000 -0.0000000000000000 5.0729992295718249 Si C 2 2 Direct 0.3333333332999970 0.6666666667000030 -0.0000493014454436 0.6666666667000030 0.3333333332999970 0.4999506985545563 0.3333333332999970 0.6666666667000030 0.3758713014454431 0.6666666667000030 0.3333333332999970 0.8758713014454431 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00 0.00000000E+00
4. Create two structures with decreased and increased volume¶
We now need to create two sets of VASP inputs.
Create a new
POSCAR
using theCONTCAR
from the previous step, but modify it. The 2nd line ofCONTCAR
should be modified by replacing the scaling factor of 1.0 to a reduced volume. We will reduce it to 99% of the original volume, meaning that the scaling factor should be set at \(0.99^{1/3}\) = 0.9966554934125964. Save this toPOSCAR_SMALLER
.Create yet another
POSCAR
using theCONTCAR
from the step 1 and 2. Again modify the second line to a larger scaling factor, which represent a 1% volume increase, meaning that the scaling factor should be set at \(1.01^{1/3}\) = 1.0033222835420892. Save this toPOSCAR_BIGGER
.
5. Perform the relaxation of modified volumes and 6. Wait for it to finish¶
We will now run VASP on the two modified structures with a decreased and increased volume, respectively and inspect the resulting stress. One performing these relaxations, we want to keep the cell volume static since we have modified this manually.
Modify the
INCAR
and setISIF=4
to ensure VASP does not relax the cell volume.Execute VASP using the previous
KPOINTS
andPOTCAR
, and newly generatedINCAR
andPOSCAR_SMALLER
. Inspect thevasprun.xml
file to find something like:<varray name="stress" > <v> 21.84700539 -0.00000000 0.00000000 </v> <v> 0.00000000 21.84700539 -0.00000000 </v> <v> 0.00000000 0.00000000 21.84572011 </v> </varray>
for the stress on the last ionic step. Note that the volume of the cell is 41.59 \(\unicode{x212B}^3\). Keep these results at hand for later.
Execute VASP yet again using the previous
INCAR
,KPOINTS
andPOTCAR
, but use nowPOSCAR_BIGGER
. Inspect again the newvasprun.xml
file to find something like:<varray name="stress" > <v> -20.80399703 0.00000000 0.00000000 </v> <v> 0.00000000 -20.80399703 0.00000000 </v> <v> 0.00000000 0.00000000 -20.80406790 </v> </varray>
for the stress on the last ionic step. Note that the volume of the cell is 42.43 \(\unicode{x212B}^3\). Keep these results at hand for later.
7. Calculate the bulk modulus¶
The bulk modulus can now be calculated from these results as:
$ verdi shell
In [1]: - ( ( (42.43 + 41.59) / 2 ) * ( (-20.80 - 21.85) / (42.43 - 41.59) ) ) / 10
Out[1]: 213.3007738095248
We thus obtain the bulk modulus of ~213 GPa for this calculation.
3. Calculations using AiiDA¶
If there is any intention to perform bulk modulus calculations in a repeatedly and robust manner, the workflow above should be define more formally. AiiDA comes into play with the concept of workflows. Let us now try to perform the same calculation with assistance from AiiDA.
We start by making the calling script, which you can get by running:
$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp/master/tutorials/run_bulk_modulus.py
and would look something like this:
from time import sleep
import numpy as np
from aiida.common.extendeddicts import AttributeDict
from aiida.engine import submit
from aiida.manage.configuration import load_profile
from aiida.orm import Bool, Code, Float, Group, Int, QueryBuilder, Str, WorkChainNode, load_group
from aiida.plugins import DataFactory, WorkflowFactory
load_profile()
def get_structure_SiC():
"""Set up SiC wurtzite cell
wurtzite-type SiC
1.0000000000
3.0920000000 0.0000000000 0.0000000000
-1.5460000000 2.6777505485 0.0000000000
0.0000000000 0.0000000000 5.0730000000
Si C
2 2
Direct
0.3333333333 0.6666666667 0.0000000000
0.6666666667 0.3333333333 0.5000000000
0.3333333333 0.6666666667 0.3758220000
0.6666666667 0.3333333333 0.8758220000
"""
StructureData = DataFactory('structure')
a = 3.092
c = 5.073
lattice = [[a, 0, 0], [-a / 2, a / 2 * np.sqrt(3), 0], [0, 0, c]]
structure = StructureData(cell=lattice)
for pos_direct, symbol in zip(([1. / 3, 2. / 3, 0], [2. / 3, 1. / 3, 0.5], [1. / 3, 2. / 3, 0.375822
], [2. / 3, 1. / 3, 0.875822]),
('Si', 'Si', 'C', 'C')):
pos_cartesian = np.dot(pos_direct, lattice)
structure.append_atom(position=pos_cartesian, symbols=symbol)
return structure
def launch_aiida_relax_shape(structure, code_string, options, label):
"""Launch a relaxation of shape, but not volume."""
Dict = DataFactory('dict')
KpointsData = DataFactory('array.kpoints')
# Set INCAR flags
base_incar_dict = {
'incar': {
'PREC': 'Accurate',
'EDIFF': 1e-8,
'NELMIN': 5,
'NELM': 100,
'ENCUT': 500,
'IALGO': 38,
'ISMEAR': 0,
'SIGMA': 0.01,
'LREAL': False,
'LCHARG': False,
'LWAVE': False
}
}
# Set code, potentials to use and options
base_config = {
'code_string': code_string,
'potential_family': 'PBE',
'potential_mapping': {
'Si': 'Si',
'C': 'C'
},
'options': options
}
# Make sure some things are parsed
base_parser_settings = {'add_energies': True, 'add_forces': True, 'add_stress': True}
# Set the provided code string
code = Code.get_from_string(base_config['code_string'])
# Which workchain to use
workchain = WorkflowFactory('vasp.relax')
# Construct the builder
builder = workchain.get_builder()
builder.code = code
builder.parameters = Dict(dict=base_incar_dict)
builder.structure = structure
builder.settings = Dict(dict={'parser_settings': base_parser_settings})
builder.potential_family = Str(base_config['potential_family'])
builder.potential_mapping = Dict(dict=base_config['potential_mapping'])
kpoints = KpointsData()
kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])
builder.kpoints = kpoints
builder.options = Dict(dict=base_config['options'])
builder.metadata.label = label
builder.metadata.description = label
builder.clean_workdir = Bool(False)
relax = AttributeDict()
relax.perform = Bool(True)
relax.force_cutoff = Float(1e-5)
relax.positions = Bool(True)
relax.shape = Bool(True)
relax.volume = Bool(False)
builder.relax = relax
builder.verbose = Bool(True)
# Submit calc
node = submit(builder)
return node
def launch_aiida_full_relax(structure, code_string, options, label):
"""Launch a relaxation of everything."""
Dict = DataFactory('dict')
KpointsData = DataFactory('array.kpoints')
# Set INCAR flags
base_incar_dict = {
'incar': {
'PREC': 'Accurate',
'EDIFF': 1e-8,
'NELMIN': 5,
'NELM': 100,
'ENCUT': 500,
'IALGO': 38,
'ISMEAR': 0,
'SIGMA': 0.01,
'LREAL': False,
'LCHARG': False,
'LWAVE': False
}
}
# Set code, potentials to use and options
base_config = {
'code_string': code_string,
'potential_family': 'PBE.54',
'potential_mapping': {
'Si': 'Si',
'C': 'C'
},
'options': options
}
# Make sure some things are parsed
base_parser_settings = {'add_energies': True, 'add_forces': True, 'add_stress': True}
# Set the provided code string
code = Code.get_from_string(base_config['code_string'])
# Which workchain to use
workchain = WorkflowFactory('vasp.relax')
# Construct the builder
builder = workchain.get_builder()
builder.code = code
builder.parameters = Dict(dict=base_incar_dict)
builder.structure = structure
builder.settings = Dict(dict={'parser_settings': base_parser_settings})
builder.potential_family = Str(base_config['potential_family'])
builder.potential_mapping = Dict(dict=base_config['potential_mapping'])
kpoints = KpointsData()
kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])
builder.kpoints = kpoints
builder.options = Dict(dict=base_config['options'])
builder.metadata.label = label
builder.metadata.description = label
builder.clean_workdir = Bool(False)
relax = AttributeDict()
relax.perform = Bool(True)
relax.force_cutoff = Float(1e-5)
relax.positions = Bool(True)
relax.shape = Bool(True)
relax.volume = Bool(True)
relax.steps = Int(100)
builder.relax = relax
builder.verbose = Bool(True)
# Submit the calc
node = submit(builder)
return node
def main(code_string, options, group_name, sleep_seconds=30):
group = load_group(label=group_name)
if not group.is_empty:
# Make sure we do not already have members in the group
print('The given group already have nodes. Please remove them or give a new group name.')
exit(1)
# Initial relaxation
structure = get_structure_SiC()
node_relax = launch_aiida_full_relax(structure, code_string, options, 'SiC VASP calc to relax volume')
group.add_nodes(node_relax)
print(f'Relaxing starting cell: {structure.cell}')
while True:
if node_relax.is_terminated:
break
print('Waiting for relaxation of starting cell to be done.')
sleep(sleep_seconds)
if node_relax.is_finished_ok:
for strain, label in zip((0.99, 1.01), ('reduced', 'increased')):
structure_clone = node_relax.outputs.relax.structure.clone()
ase_structure = structure_clone.get_ase()
# Make sure we also update atomic positions after scaling the cell
ase_structure.set_cell(ase_structure.cell * strain**(1.0 / 3.0), scale_atoms=True)
StructureData = DataFactory('structure')
structure = StructureData(ase=ase_structure)
node = launch_aiida_relax_shape(
structure, code_string, options, f'SiC VASP relax shape at {label} volume ({strain:f})'
)
group.add_nodes(node)
print(f'Relaxation positions of scaled cell: {ase_structure.cell}')
else:
print('Relaxation failed.')
exit(1)
# Now make sure we wait until all calcs have been terminated
nodes_terminated = [True]
while True:
sleep(sleep_seconds)
print('Waiting for all remaining relaxations to be done.')
node_terminated = []
for node in group.nodes:
node_terminated.append(node.is_terminated)
if all(node_terminated):
break
# Do one final check to make sure all calcs have a zero exit status
for node in group.nodes:
if not node.is_finished_ok:
print(
f'Node with id: {node.pk} has an exit status: {node.exit_status} and exit message: {node.exit_message}'
)
exit(1)
def calc_bulk_modulus(group_name):
stresses = []
volumes = []
for label in ('reduced', 'increased'):
qb = QueryBuilder()
qb.append(Group, filters={'label': group_name}, tag='group')
qb.append(WorkChainNode, with_group='group', filters={'label': {'ilike': '%' + label + '%'}})
node = qb.first()[0]
stresses.append(np.trace(node.outputs.stress.get_array('final')) / 3)
volumes.append(np.linalg.det(node.inputs.structure.cell))
d_s = stresses[1] - stresses[0]
d_v = volumes[1] - volumes[0]
v0 = (volumes[0] + volumes[1]) / 2.0
bulk_modulus = -d_s / d_v * v0
print(f'Bulk modulus: {bulk_modulus / 10.0} GPa')
if __name__ == '__main__':
# Code_string is chosen from output of the list given by 'verdi code list'
code_string = 'vasp@mycluster'
# Set the options
options = {
'resources': {
'num_machines': 1,
'num_mpiprocs_per_machine': 8
},
'account': '',
'qos': '',
'max_memory_kb': 2000000,
'max_wallclock_seconds': 1800
}
# Here we assume the group "Bulk_modulus_SiC_test" has already been created, but are empty
group_name = 'Bulk_modulus_SiC_test'
# Perform necessary calculations and put them in the created group
main(code_string, options, group_name)
# Calculate the bulk modulus by querying the group for the necessary input
calc_bulk_modulus(group_name)
Notice a few things:
We set
builder.clean_workdir = Bool(False)
. This makes sure that the folder used on the defined computer (usually remotely on the cluster in most cases) for the VASP calculations are not removed after a successful calculation (which is usually the default). Then one can after aVaspCalculation
is finished executeverdi calcjob gotocomputer <PK>
, where<PK>
is the id given to theVaspCalculation
process you want to inspect. This takes you directly to the submission folder where the input and output files can be directly inspected. This setting is rather useful to debug and verify that your input and output files are as expected, regardless of the plugin.When we use
submit
the process node is returned when we callgroup.add_nodes()
passing the relevant node. This does however not mean that the node represents a finalized and successful calculation. This process will continue. Hence, we this need to explicitly check that the calculations are finished, here in a separate step using the node attributeis_terminated
and later also checking that the node does not contain any exit code, which means the calculation was not successful by checking the node attributeis_finished_ok
. Only when all calculations are in a successful finished state can we proceed to calculate the bulk modulus.This bulk modulus script assumes the AiiDA group named
Bulk_modulus_SiC_test
already exists. This group is created by:$ verdi group create "Bulk_modulus_SiC_test"
And we can use:
$ verdi group list
to inspect the present groups.
We only rudimentary search the group to find the nodes which contains the results as outputs we need to calculate the bulk modulus. Hence, we can not run this script a second time as we would then pick up the old members of the group and print the bulk modulus of that. For some cases, this is good, meaning we reuse previous calculations so that we do not have to recalculate them, but for the sake of the example it can be a bit confusing. Hence, we check that the group
Bulk_modulus_SiC_test
is empty as the beginning of themain
and stop if it is not. One can then, if for instance if rerunning the script, delete the group by first getting the id withverdi group list
, locating the<PK>
and then issuingverdi group delete <PK>
, followed by a newverdi group create "Bulk_modulus_SiC_test"
. Then the script can be launched again.
Modify the run_bulk_modulus.py
script to your setup. If in doubt, go back to previous tutorials. In essence, you need to look at the
code defined, the potential family you have uploaded potentials into and the resources to be used
where the code is installed. When that is done, execute the script:
$ python run_bulk_modulus.py
Relaxing starting cell: [[3.092, 0.0, 0.0], [-1.546, 2.6777505485015, 0.0], [0.0, 0.0, 5.073]]
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Relaxation positions of scaled cell: Cell([[3.081742345228316, 0.0, 0.0], [-1.540871172614158, 2.668867162801478, 0.0], [0.0, 0.0, 5.056032550657371]])
Relaxation positions of scaled cell: Cell([[3.102356619252392, 0.0, 0.0], [-1.551178309626196, 2.686719647813093, 0.0], [0.0, 0.0, 5.0898531718508595]])
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Bulk modulus: 213.2516947999474 GPa
Depending on your setup it will take some minutes before the bulk modulus is printed, assuming your submitted calculations does not queue and start immediately. As we see, it is similar to the result we obtained using the manual method above. Obviously, if we need to redo this calculation, either using different input parameters and/or other input structures it would be little overhead involved and we could immediately benefit from it during our daily workflows. However, we can do better.
4. Writing bulk modulus workchain¶
In the previous section, the initial relaxation and the two volume restricted relaxations are performed
independently. The resulting nodes are just grouped and we execute a
calculation of the bulk modulus by fetching these nodes later. This
means the workflow defined in the main
method is not very transparent and we have to introduce while
iterations and sleep the code for some time before continuing the iteration. This works, but is
not very elegant. In fact, we would also like our workflow to follow the steps of the flow of work we defined at the top,
also formally. In doing so, it makes it easier to both design new and understand existing workflows. Furthermore,
it makes reuse easier due to the modular design of it all.
The question is thus; can we improve the somewhat large script introduced in the previous section? The answer is a definitive; yes. AiiDA has the concept of a Workchains which basically is a container for a workflow, or a component in a workflow. It is supposed to be the modular concept in AiiDA that gives us enough freedom to dissect and combine workchains into basic and complex workflows. It is apparent that we would like to write a workchain as modular and reusable as possible, such that ultimately several Workchains can be cherry picked into a bigger composition of some kind of master piece of a workchain to solve some given problem. And in doing so, keeping as close to the standard and expectations given in the domain so that others can rapidly relate to and understand what the workflow and its workchains are doing.
Note
The additional Workchains not bundled with AiiDA-VASP have to be exposed as a
Python module to be seen from AiiDA daemon. It should be available in $PYTHONPATH
of the daemon. The easiest way
to achieve this is usually to make a small plugin that you can use as a container for the installation and documentation
of the workchain, or set of workchain. This can then be installed (with pip
for instance) in
an AiiDA environment where you want to utilize the workchain(s).
So, let us now try to preserve the workflow described at the top more formally in code as well.
The challenge will then be writing a suitable workchain that represents this workflow. It turns out that
the migration from the run_bulk_modulus.py
script above is rather straightforward. In the process we will
also try to develop the workchain in a way which makes it easy to maintain, develop and share. In,
addition to just transferring the run script developed above into a more robust workflow we will at the same time
look into:
Using the AiiDA plugin cookie cutter to prepare the structure of the plugin, which will be a very simple one in our case. The workchain what will define our workflow which we will call from a suitable run script.
Note
AiiDA is designed to be very modular and by making our single workflow for the bulk modulus is on its own a plugin,we make sure it can also be installed easily and correctly by many users. The overhead of doing this is highly worth it and sounds more daunting than it is in practice. You will also benefit from this.
Writing documentation.
Note
Having at least basic documentation in place is very important, not only for others, but also for yourself. Everyone forget and you forget more quickly than you think. Also notice that it makes sense to be as explicit as possible when writing the workchain code. When doing this, it is much easier for new people to inspect and understand what the code actually does.
Enabling possibilities to share the plugin with others due to easy install.
Okey, let us now start. We have already a pretty good mental picture on what we need to calculate and what to extract from the calculations. Buckle up, this is going to be a very compact introduction to how you can write plugins and workchains following good practices.
1. Generate the plugin structure and basics¶
We will now start by utilizing the AiiDA plugin cookie cutter. If you are not familiar with the cookie cutter principle, it is a way to quickly get baseline structure in place for modular code pieces or extensions, which is perfect for AiiDA plugins.
Make sure you have your AiiDA Python environment enabled.
Go to some location which will house the folder and Git repository of the bulk modulus plugin we will develop.
Install
cookiecutter
:$ pip install cookiecutter
Run the
cookiecutter
and answer the questions accordingly:$ cookiecutter https://github.com/aiidateam/aiida-plugin-cutter.git plugin_name [aiida-diff]: aiida-vasp-bm module_name [aiida_vasp_bm]: short_description [AiiDA demo plugin that wraps the `diff` executable for computing the difference between two files.]: A workflow for AiiDA-VASP that calculates the bulk modulus. entry_point_prefix [vasp_bm]: github_user [aiidateam]: <yourgithubusername> repo_name [aiida-vasp-bm]: contact_email []: <yourcontactemail> version [0.1.0a0]: author [The AiiDA Team]: The AiiDA-VASP developer team aiida_min_version [2.0]: 2.2 year [2022]: 2023 Running black on aiida-vasp-bm All done! ✨ 🍰 ✨ 14 files left unchanged. ### IMPORTANT### Register your plugin NOW by making a pull request to the AiiDA Plugin Registry at https://aiidateam.github.io/aiida-registry/
where
<yourgithubusername>
is your username on GitHub, assuming you eventually want to publish the workflow there. Also, a contact email<yourcontactemail>
should be provided.Enter the resulting
aiida-vasp-bm
folder and inspect it. The first thing you should do is to read theREADME.md
file to know what the AiiDA plugin cookie cutter provides and how to further configure it. Out of the box, the following is provided:A license file. MIT is provided out of the box.
The
pyproject.toml
file. Contains information about the plugin, dependencies etc. This also enabled us to install the plugin with e.g.A basic example calculation (
aiida_vasp_bm/calculations.py::DiffCalculation
) and a basic parser (aiida_vasp_bm/parsers.py::DiffParser
). In addition a few command line examples are given inaiida_vasp_bm/cli.py
. None of this is relevant for us. We will instead reuse theVaspCalculation
in the AiiDA-VASP plugin, the AiiDA-VASP parsers and its command line infrastructure.Basic tests. Testing your code is important, so a testing framework is also provided to test the example calculation and cli in the
tests
folder.Documentation. A sample using Sphinx that already can be compiled is found in the
docs
folder. In there, the web documentation can be built by runningmake html
.
2. Adapting the generated plugin template¶
Adapting the cookiecutter
output from the previous step to our specific involves both removing, modifying existing files
and adding the file containing our workchain.
Since what we need is a very simple plugin, basically only containing the workchain housing our workflow, some light
documentation etc. we will now remove what we do not need from the output of the cookiecutter
command in the
previous step. Please remove:
The
tests
folder in the root directory. We would generally advice against not prioritizing testing, but as stated above, testing workchains which use external codes is rather complicated and is not the topic of this tutorial. Also remove everything inside theexamples
folder.The
calculations.py
,cli.py
,parsers.py
andhelpers.py
file and thedata
folder from theaiida_vasp_bm
folder.The
conftest.py
from the root directory.Remove the
testing
section from the[project.optional-dependencies]
section inpyproject.toml
. Also remove theproject.entry-points."aiida.data
,project.entry-points."aiida.parsers
,project.entry-points."aiida.calculations
andproject.entry-points."aiida.cmdline.data
sections. Furthermore, since the AiiDA-VASP plugin depends on AiiDA we can remove that and add the proper version of AiiDA-VASP to thedependencies
section so that it now reads:dependencies = [ "voluptuous", aiida-vasp ]
and the
project.optional-dependencies
so that it reads:[project.optional-dependencies] testing = [ "aiida-vasp[tests]" ] pre-commit = [ "aiida-vasp[pre-commit]" ] docs = [ "aiida-vasp[pre-commit]" ]
By changing this, we make sure the dependencies of this plugin follows what is given in the latest AiiDA-VASP version. We also add what is called an entry point for our workchain:
[project.entry-points."aiida.workflows"] "vasp.vasp_bm" = "aiida_vasp_bm.workchains.bulkmodulus:BulkModulusWorkChain"
which makes it possible later to use the
WorkflowFactory
function of AiiDA to load our workchain in the calling script using thevasp.bm
key.Furthermore, make sure
requires-python = ">=3.8"
and remove37
under thetestenv:py
entry, which is also something we require in AiiDA-VASP. Thepyproject.toml
should now look something like:[build-system] # build the package with [flit](https://flit.readthedocs.io) requires = ["flit_core >=3.4,<4"] build-backend = "flit_core.buildapi" [project] # See https://www.python.org/dev/peps/pep-0621/ name = "aiida-vasp-bm" dynamic = ["version"] # read from aiida_vasp_bm/__init__.py description = "A workflow for AiiDA-VASP that calculates the bulk modulus." authors = [{name = "The AiiDA-VASP developer team", email = "<youremailaddress>"}] readme = "README.md" license = {file = "LICENSE"} classifiers = [ "Programming Language :: Python", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Development Status :: 3 - Alpha", "Framework :: AiiDA" ] keywords = ["aiida", "plugin"] requires-python = ">=3.8" dependencies = [ "aiida-vasp", "voluptuous" ] [project.urls] Source = "https://github.com/<yourgithubusername>/aiida-vasp-bm" [project.entry-points."aiida.workflows"] "vasp.bm" = "aiida_vasp_bm.workchains.bulkmodulus:BulkModulusWorkChain" [project.optional-dependencies] testing = [ "aiida-vasp[tests]" ] pre-commit = [ "aiida-vasp[pre-commit]" ] docs = [ "aiida-vasp[pre-commit]" ] [tool.flit.module] name = "aiida_vasp_bm" [tool.pylint.format] max-line-length = 125 [tool.pylint.messages_control] disable = [ "too-many-ancestors", "invalid-name", "duplicate-code", ] [tool.pytest.ini_options] # Configuration for [pytest](https://docs.pytest.org) python_files = "test_*.py example_*.py" filterwarnings = [ "ignore::DeprecationWarning:aiida:", "ignore:Creating AiiDA configuration folder:", "ignore::DeprecationWarning:plumpy:", "ignore::DeprecationWarning:yaml:", ] [tool.coverage.run] # Configuration of [coverage.py](https://coverage.readthedocs.io) # reporting which lines of your plugin are covered by tests source=["aiida_vasp_bm"] [tool.isort] # Configuration of [isort](https://isort.readthedocs.io) line_length = 120 force_sort_within_sections = true sections = ['FUTURE', 'STDLIB', 'THIRDPARTY', 'AIIDA', 'FIRSTPARTY', 'LOCALFOLDER'] known_aiida = ['aiida'] [tool.tox] legacy_tox_ini = """ [tox] envlist = py38 [testenv] usedevelop=True [testenv:py{38,39,310}] description = Run the test suite against a python version extras = testing commands = pytest {posargs} [testenv:pre-commit] description = Run the pre-commit checks extras = pre-commit commands = pre-commit run {posargs} [testenv:docs] description = Build the documentation extras = docs commands = sphinx-build -nW --keep-going -b html {posargs} docs/source docs/build/html commands_post = echo "open file://{toxinidir}/docs/build/html/index.html" """
We now need to add the actual workchain to the plugin.
First get the workchain file by running:
$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp-bm/master/aiida_vasp_bm/workchains/bulkmodulus.py
and put this file in the aiida_vasp_bm
folder. It should look something like:
import numpy as np
from aiida.orm import Bool
from aiida.plugins import DataFactory, WorkflowFactory
from aiida.engine import ToContext, WorkChain, calcfunction
from aiida.common.extendeddicts import AttributeDict
Dict = DataFactory('dict')
Float = DataFactory('float')
KpointsData = DataFactory("array.kpoints")
@calcfunction
def get_strained_structure(structure, strain):
new_structure = structure.clone()
new_structure.set_cell(
np.array(new_structure.cell) * strain.value ** (1.0 / 3))
return new_structure
@calcfunction
def calculate_bulk_modulus(stress_minus, stress_plus,
structure_minus, structure_plus):
stresses = []
volumes = []
for stress in (stress_minus, stress_plus):
stresses.append(np.trace(stress.get_array('final')) / 3)
for structure in (structure_minus, structure_plus):
volume = np.linalg.det(structure.cell)
volumes.append(volume)
d_s = stresses[1] - stresses[0]
d_v = volumes[1] - volumes[0]
v0 = (volumes[0] + volumes[1]) / 2
bulk_modulus = - d_s / d_v * v0 / 10 # GPa
return Float(bulk_modulus)
class BulkModulusWorkChain(WorkChain):
"""WorkChain to compute bulk modulus using VASP."""
_next_workchain_string = 'vasp.relax'
_next_workchain = WorkflowFactory(_next_workchain_string)
@classmethod
def define(cls, spec):
super(BulkModulusWorkChain, cls).define(spec)
spec.expose_inputs(cls._next_workchain)
spec.outline(
cls.initialize,
cls.run_relax,
cls.create_two_structures,
cls.run_two_volumes,
cls.calc_bulk_modulus,
)
spec.output('bulk_modulus', valid_type=Float)
def initialize(self):
self.report("initialize")
self.ctx.inputs = AttributeDict()
self.ctx.inputs.update(self.exposed_inputs(self._next_workchain))
def run_relax(self):
self.report("run_relax")
Workflow = WorkflowFactory('vasp.relax')
builder = Workflow.get_builder()
for key in self.ctx.inputs:
builder[key] = self.ctx.inputs[key]
if 'label' in self.ctx.inputs.metadata:
label = self.ctx.inputs.metadata['label'] + " relax"
builder.metadata['label'] = label
if 'description' in self.ctx.inputs.metadata:
description = self.ctx.inputs.metadata['description'] + " relax"
builder.metadata['description'] = description
future = self.submit(builder)
return ToContext(relax=future)
def create_two_structures(self):
assert self.ctx.relax.is_finished_ok
self.report("create_two_structures")
for strain, name in zip((0.99, 1.01), ('reduced', 'increased')):
structure = get_strained_structure(
self.ctx.relax.outputs.relax.structure, Float(strain))
structure.label = name
self.ctx['structure_%s' % name] = structure
def run_two_volumes(self):
self.report("run_two_volumes")
for strain, future_name in zip((0.99, 1.01), ('reduced', 'increased')):
Workflow = WorkflowFactory('vasp.relax')
builder = Workflow.get_builder()
for key in self.ctx.inputs:
builder[key] = self.ctx.inputs[key]
if 'label' in self.ctx.inputs.metadata:
label = self.ctx.inputs.metadata['label'] + " " + future_name
builder.metadata['label'] = label
if 'description' in self.ctx.inputs.metadata:
description = self.ctx.inputs.metadata['description']
description += " " + future_name
builder.metadata['description'] = description
builder.structure = self.ctx['structure_%s' % future_name]
relax = AttributeDict()
relax.perform = Bool(True)
relax.force_cutoff = Float(1e-8)
relax.positions = Bool(True)
relax.shape = Bool(True)
relax.volume = Bool(False)
relax.convergence_on = Bool(False)
builder.relax = relax
future = self.submit(builder)
self.to_context(**{future_name: future})
def calc_bulk_modulus(self):
assert self.ctx.reduced.is_finished_ok
assert self.ctx.increased.is_finished_ok
self.report("calc_bulk_modulus")
bulk_modulus = calculate_bulk_modulus(
self.ctx.reduced.outputs.stress,
self.ctx.increased.outputs.stress,
self.ctx.reduced.inputs.structure,
self.ctx.increased.inputs.structure)
bulk_modulus.label = "Bulk modulus in GPa"
self.out('bulk_modulus', bulk_modulus)
self.report('finished bulk modulus workflow')
The workflow is the same as what we implemented in run_bulk_modulus.py
, but here, we have increased the
verbosity and tried to be more explicit to comply with a typical implementation of a workchain in AiiDA.
Before continuing, you should read up on how you write workchains in AiiDA. The main functionality for our workchain,
which will calculate the bulk modulus is housed in the class BulkModulusWorkChain
, which inherits from
the general WorkChain
class of AiiDA. Consult the how you write workchains in AiiDA for more details of the structure of this,
but summarized very briefly here: (i) we define what is going to be supplied as inputs, what is going to be done, or executed,
and what is going to be an output in the define
method of the BulkModulusWorkChain
class. As you also noticed
from the run_bulk_modulus.py
script, we eventually called the RelaxWorkChain
class of the AiiDA-VASP plugin multiple
times. We also need to do this here. The spec.expose_inputs()
configures the BulkModulusWorkChain
to expect the
same inputs as defined in another workchain, in this case, the workchain we will call later, the RelaxWorkChain
. In
addition, we specify an output that is required for this particular workchain, the bulk_modulus
, which is required to be
a simple float datatype.
Note
Again, remember that AiiDA have wrappers for the usual Python datatypes, including float
, represented as Float
.
This is due to the fact that in AiiDA, the datatypes often need to be passed around and they need to obey data provenance
principles. As such they are defined as nodes which gives additional functionalities over the basic data types, such as
the ability to store, lock and link them. Also, as you have maybe already noticed from previous tutorials, AiiDA
also supplies other dedicated data types. All of these can be used as input or output of workchains.
Finally, we define what is to be done in the outline
section. This is sequential and refer to functions defined on
this class further down in the file. It contains the following functions:
initialize
Initialize whatever needs initializing.run_relax
Run a relaxation calculation to fully relax the initial crystal structure.create_two_structures
Create two structures with a +/- 1% change in the volume from the relaxed structure obtained at the previous step.run_two_volumes
Run two relaxation calculations to relax the shape and positions of the structures created at the previous step at fixed volume.calc_bulk_modulus
Compute the bulk modulus using the formula \(K \simeq -V_0 \frac{\Delta P}{\Delta V}\), where the pressure \(P \equiv \mathrm{Tr}(\sigma)/3\) follows the VASP convention where \(\sigma\) is the stress tensor.
You can inspect the details of the BulkModulusWorkChain
otherwise and inspect that it largely
takes the same inputs and gives the same outputs as we did in the manual and the run_bulk_modulus.py
way above.
In the end of this section, we now would like to raise a few notes:
Note
Since the workchain definition is sequential, the calc_bulk_modulus
function will not be executed until the previous steps
are completed.
Note
Data created inside a workchain should be registered so that data provenance can be kept. This can for simple functions
be performed using the calcfunction concept and decorator in AiiDA. For this purpose, the
strained crystal structures are created in get_strained_structure
and the bulk modulus is calculated in
calculate_bulk_modulus
, and its method is decorated by @calcfunction
. These functions are called from
the BulkModulusWorkChain
when needed. The calcfunction concept is a very compact and slim way to pass tracked
input and output to and from a function, thus honoring data provenance. For more complex functions, it makes sense to
instead consider porting the function to a dedicated workchain.
Note
We have utilize ToContext
and to_context
, which ensures that we can make sure the workflow does not continue until
what is set in these are done. Please consult the AiiDA documentation on the to context topic to learn more.
3. Creating the launch script¶
The launch script will define our input structure and launch the BulkModulusWorkChain
workchain which
will contain the bulk_modulus
output node that we can inspect when it is finalized. Obtain the launch script:
$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp-bm/master/examples/run_vasp_bm.py
It should look something like:
import numpy as np
from aiida.common.extendeddicts import AttributeDict
from aiida.manage.configuration import load_profile
from aiida.orm import Bool, Str, Code, Int, Float, WorkChainNode, QueryBuilder, Group
from aiida.plugins import DataFactory, WorkflowFactory
from aiida.engine import submit
load_profile()
Dict = DataFactory('dict')
KpointsData = DataFactory("array.kpoints")
def launch_aiida_bulk_modulus(structure, code_string, options,
label="VASP bulk modulus calculation"):
incar_dict = {
'incar': {
'PREC': 'Accurate',
'EDIFF': 1e-8,
'NELMIN': 5,
'NELM': 100,
'ENCUT': 500,
'IALGO': 38,
'ISMEAR': 0,
'SIGMA': 0.01,
'LREAL': False,
'LCHARG': False,
'LWAVE': False
}
}
kpoints = KpointsData()
kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])
potential_family = 'PBE.54'
potential_mapping = {'Si': 'Si', 'C': 'C'}
parser_settings = {'add_energies': True,
'add_forces': True,
'add_stress': True}
code = Code.get_from_string(code_string)
Workflow = WorkflowFactory('vasp.bm')
builder = Workflow.get_builder()
builder.code = code
builder.parameters = Dict(dict=incar_dict)
builder.structure = structure
builder.settings = Dict(dict={'parser_settings': parser_settings})
builder.potential_family = Str(potential_family)
builder.potential_mapping = Dict(dict=potential_mapping)
builder.kpoints = kpoints
builder.options = Dict(dict=options)
builder.metadata.label = label
builder.metadata.description = label
builder.clean_workdir = Bool(False)
relax = AttributeDict()
relax.perform = Bool(True)
relax.force_cutoff = Float(1e-8)
relax.steps = Int(100)
relax.positions = Bool(True)
relax.shape = Bool(True)
relax.volume = Bool(True)
builder.relax = relax
builder.verbose = Bool(True)
node = submit(builder)
return node
def get_structure_SiC():
"""Set up SiC wurtzite cell
wurtzite-type SiC
1.0000000000
3.0920000000 0.0000000000 0.0000000000
-1.5460000000 2.6777505485 0.0000000000
0.0000000000 0.0000000000 5.0730000000
Si C
2 2
Direct
0.3333333333 0.6666666667 0.0000000000
0.6666666667 0.3333333333 0.5000000000
0.3333333333 0.6666666667 0.3758220000
0.6666666667 0.3333333333 0.8758220000
"""
StructureData = DataFactory('structure')
a = 3.092
c = 5.073
lattice = [[a, 0, 0],
[-a / 2, a / 2 * np.sqrt(3), 0],
[0, 0, c]]
structure = StructureData(cell=lattice)
for pos_direct, symbol in zip(
([1. / 3, 2. / 3, 0],
[2. / 3, 1. / 3, 0.5],
[1. / 3, 2. / 3, 0.375822],
[2. / 3, 1. / 3, 0.875822]), ('Si', 'Si', 'C', 'C')):
pos_cartesian = np.dot(pos_direct, lattice)
structure.append_atom(position=pos_cartesian, symbols=symbol)
return structure
def main(code_string, options):
structure = get_structure_SiC()
node = launch_aiida_bulk_modulus(structure, code_string, options,
label="SiC VASP bulk modulus calculation")
print('Launched workchain node: ', node)
if __name__ == '__main__':
# Code_string is chosen from output of the list given by 'verdi code list'
code_string = 'vasp@mycluster'
# Set the options
options = {
'resources': {
'num_machines': 1,
'num_mpiprocs_per_machine': 8
},
'account': '',
'qos': '',
'max_memory_kb': 2000000,
'max_wallclock_seconds': 1800
}
# Run workflow
main(code_string, options)
Place this call script in the examples
directory of the plugin.
We also have to install the plugin we have created, such that the AiiDA daemon can pick it up. It is convenient
to install it in what we call editable mode. Doing this, makes it possible to utilize the framework of installation
and the fact that other packages like AiiDA becomes aware of our new plugin and workchain, while maintaining the
convenience of being able to quickly modify the code and relaunch. Assuming you are not in the root folder of the
aiida-vasp-bm
plugin (the folder containing the aiida_vasp_bm
and pyproject.toml
file), execute:
$ pip install -e .
which will install the plugin in editable mode.
Warning
Please remember to restart the daemon if you change any code that the AiiDA daemon should have access to. This typically
goes for all editable (the ones installed with the -e
option for pip
) installations of Python packages. This
is very easy to miss and during heavy development it sometimes make sense to restart the daemon in the call script
to not forget. Beware that if you forget this, it can cause a significant time drain on debugging something that already
might be working just fine.
Again, tailor the launch script to your needs as we now have done a few times. Launch it. Note the id <PK>
printed. Monitor its
progress with e.g. verdi process report <PK>
. When done, fetch its bulk_modulus
output id <PK'>
from
verdi process show <PK'>
and inspect it:
$ verdi node attributes <PK'>
PK: 6700
{
"value": 213.25056684995
}
or check it in Python with e.g.:
$ verdi shell
In [1]: node = load_node(<PK>)
In [2]: node.outputs.bulk_modulus.value
Out[3]: 213.25056684995
Note
The basic data types, meaning, the ones that are not listed under verdi data
. In this case, Float
can
be inspected with verdi node attributes
from the command line. From Python the entry point for inspecting this
is the same.
As we see, it yields the same bulk modulus as before. We have now made a new plugin and a workflow that we can
easily share with others and perform bulk modulus in a consistent manner on many different input structures. We can also
now include the BulkModulusWorkChain
in other workflows if we need to calculate the bulk modulus as a sub component of
that.