6. Designing a workflow

This tutorial focuses on how to approach designing, developing, and finally launching a workflow.

We will use as an example, the calculation of the bulk modulus of wurtzite-type SiC.

The tutorial is divided in two sections. In the first part we will prepare the mental state by identifying the different steps we need to perform in order to calculate the bulk modulus using VASP. While, in the second part we will make a simple script that depend on AiiDA and AiiDA-VASP to perform the same calculation.

Also, note that we will in this tutorial try to motivate the reader to write most of this and not just downloading the scripts and executing them. This is good training practice. However, in case you need or want to there is also a link to the scripts used in this tutorial. But please try not to be tempted to download them right away. For this reason, they are also mentioned at the end.

1. Flow of work

Let us first start by trying to sketch the outline of the steps needed to investigate a property, phenomena or something else. These steps typically then define the flow of work to be performed, or the workflow.

Here we take a rather simple example to illustrate how to approach this problem, such that it becomes easier to write up the workflow in AiiDA later. Maybe a good mental picture of this would be that we want to bridge the gap between the two concepts of scientific workflows and the workflows as code. The first being how you would approach a problem from the scientific side, which is obviously not something you can directly do on a computer, while the second is how it can be performed in a computer.

To obtain the bulk modulus we typically need to perform several different calculations. Broadly, we would like to follow the following steps:

  1. First we need to prepare the inputs, or starting point. In this case the crystal structure is the most important. Here, this would be the wurtzite-type SiC.

  2. Then we need to Relax the crystal structure.

  3. Then we need to Wait until (2) finishes and verify results.

  4. Then we need to Create two structures at fixed volumes with +/- 1% from the relaxed structure obtained at the step (3).

  5. Then we need to Relax the shape of the structures generated in step (4).

  6. Then we need to Wait until (5) finishes and verify results.

  7. Then we need to Compute bulk modulus as a post process by \(K \simeq -V_0\frac{\Delta P}{\Delta V}\), where we use the previous outputs as inputs.

Let us now try to perform these steps exclusively using VASP. The steps above are of course general and could be executed on any code that provides the functionality to honor the sub-steps.

2. Manual calculation

1. Preparing input files

  1. We will utilize the following crystal structure for wurtzite-type SiC, for VASP this means using the following POSCAR file:

    wurtzite-type SiC
      1.0000000000
      3.0920000000   0.0000000000   0.0000000000
     -1.5460000000   2.6777505485   0.0000000000
      0.0000000000   0.0000000000   5.0730000000
    Si    C
        2     2
    Direct
      0.3333333333   0.6666666667   0.0000000000
      0.6666666667   0.3333333333   0.5000000000
      0.3333333333   0.6666666667   0.3758220000
      0.6666666667   0.3333333333   0.8758220000
    
  2. We also need to prepare a suitable KPOINTS file, here:

    # Half grid shift along c*
    0
    Gamma
                6             6             4
      0.000000000   0.000000000   0.500000000
    
  3. Then, we need the INCAR file to tell VASP to relax the structure:

    EDIFF = 1e-08
    EDIFFG = -1e-05
    ENCUT = 500
    IALGO = 38
    IBRION = 2
    ISIF = 3
    ISMEAR = 0
    LCHARG = .FALSE.
    LREAL = .FALSE.
    LWAVE = .FALSE.
    NELM = 100
    NSW = 100
    NELMIN = 5
    PREC = Accurate
    SIGMA = 0.01
    
  4. You also need to assemble the potential files for SiC by concatenating the recommended PBE.54 potential files for Si and C into a suitable POTCAR. If in doubt about these steps, please have a look at VASP lectures, VASP tutorials, VASP howtos, VASP tutorials using notebooks or VASP videos or ask experienced VASP users. AiiDA-VASP is not a substitute for not knowing how to use VASP, its intent is to make it more efficient, reproducible to use VASP, in addition to open up for new areas of applications.

  5. Create a jobfile, or another way to execute VASP in your environment and continue to the next step.

2. Perform the relaxation and 3. Wait for it to finish

  1. Execute VASP using the prepared INCAR, POSCAR, KPOINTS and POTCAR from the previous step.

  2. Inspect the resulting relaxed structure that will be housed in the created CONTCAR file. Its content should be something like:

    wurtzite-type SiC
       1.00000000000000
         3.0920838394862775    0.0000000000000000    0.0000000000000000
        -1.5460419197431388    2.6778231556249570   -0.0000000000000000
        -0.0000000000000000   -0.0000000000000000    5.0729992295718249
       Si   C
         2     2
    Direct
      0.3333333332999970  0.6666666667000030 -0.0000493014454436
      0.6666666667000030  0.3333333332999970  0.4999506985545563
      0.3333333332999970  0.6666666667000030  0.3758713014454431
      0.6666666667000030  0.3333333332999970  0.8758713014454431
    
      0.00000000E+00  0.00000000E+00  0.00000000E+00
      0.00000000E+00  0.00000000E+00  0.00000000E+00
      0.00000000E+00  0.00000000E+00  0.00000000E+00
      0.00000000E+00  0.00000000E+00  0.00000000E+00
    

4. Create two structures with decreased and increased volume

We now need to create two sets of VASP inputs.

  1. Create a new POSCAR using the CONTCAR from the previous step, but modify it. The 2nd line of CONTCAR should be modified by replacing the scaling factor of 1.0 to a reduced volume. We will reduce it to 99% of the original volume, meaning that the scaling factor should be set at \(0.99^{1/3}\) = 0.9966554934125964. Save this to POSCAR_SMALLER.

  2. Create yet another POSCAR using the CONTCAR from the step 1 and 2. Again modify the second line to a larger scaling factor, which represent a 1% volume increase, meaning that the scaling factor should be set at \(1.01^{1/3}\) = 1.0033222835420892. Save this to POSCAR_BIGGER.

5. Perform the relaxation of modified volumes and 6. Wait for it to finish

We will now run VASP on the two modified structures with a decreased and increased volume, respectively and inspect the resulting stress. One performing these relaxations, we want to keep the cell volume static since we have modified this manually.

  1. Modify the INCAR and set ISIF=4 to ensure VASP does not relax the cell volume.

  2. Execute VASP using the previous KPOINTS and POTCAR, and newly generated INCAR and POSCAR_SMALLER. Inspect the vasprun.xml file to find something like:

    <varray name="stress" >
     <v>      21.84700539      -0.00000000       0.00000000 </v>
     <v>       0.00000000      21.84700539      -0.00000000 </v>
     <v>       0.00000000       0.00000000      21.84572011 </v>
    </varray>
    

    for the stress on the last ionic step. Note that the volume of the cell is 41.59 \(\unicode{x212B}^3\). Keep these results at hand for later.

  3. Execute VASP yet again using the previous INCAR, KPOINTS and POTCAR, but use now POSCAR_BIGGER. Inspect again the new vasprun.xml file to find something like:

    <varray name="stress" >
     <v>     -20.80399703       0.00000000       0.00000000 </v>
     <v>       0.00000000     -20.80399703       0.00000000 </v>
     <v>       0.00000000       0.00000000     -20.80406790 </v>
    </varray>
    

    for the stress on the last ionic step. Note that the volume of the cell is 42.43 \(\unicode{x212B}^3\). Keep these results at hand for later.

7. Calculate the bulk modulus

The bulk modulus can now be calculated from these results as:

$ verdi shell
In [1]: - ( ( (42.43 + 41.59) / 2  ) * ( (-20.80 - 21.85) / (42.43 - 41.59) ) ) / 10
Out[1]: 213.3007738095248

We thus obtain the bulk modulus of ~213 GPa for this calculation.

3. Calculations using AiiDA

If there is any intention to perform bulk modulus calculations in a repeatedly and robust manner, the workflow above should be define more formally. AiiDA comes into play with the concept of workflows. Let us now try to perform the same calculation with assistance from AiiDA.

We start by making the calling script, which you can get by running:

$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp/master/tutorials/run_bulk_modulus.py

and would look something like this:

from time import sleep

import numpy as np

from aiida.common.extendeddicts import AttributeDict
from aiida.engine import submit
from aiida.manage.configuration import load_profile
from aiida.orm import Bool, Code, Float, Group, Int, QueryBuilder, Str, WorkChainNode, load_group
from aiida.plugins import DataFactory, WorkflowFactory

load_profile()


def get_structure_SiC():
    """Set up SiC wurtzite cell

    wurtzite-type SiC
      1.0000000000
      3.0920000000   0.0000000000   0.0000000000
     -1.5460000000   2.6777505485   0.0000000000
      0.0000000000   0.0000000000   5.0730000000
    Si    C
        2     2
    Direct
      0.3333333333   0.6666666667   0.0000000000
      0.6666666667   0.3333333333   0.5000000000
      0.3333333333   0.6666666667   0.3758220000
      0.6666666667   0.3333333333   0.8758220000

    """

    StructureData = DataFactory('structure')
    a = 3.092
    c = 5.073
    lattice = [[a, 0, 0], [-a / 2, a / 2 * np.sqrt(3), 0], [0, 0, c]]
    structure = StructureData(cell=lattice)
    for pos_direct, symbol in zip(([1. / 3, 2. / 3, 0], [2. / 3, 1. / 3, 0.5], [1. / 3, 2. / 3, 0.375822
                                                                                ], [2. / 3, 1. / 3, 0.875822]),
                                  ('Si', 'Si', 'C', 'C')):
        pos_cartesian = np.dot(pos_direct, lattice)
        structure.append_atom(position=pos_cartesian, symbols=symbol)
    return structure


def launch_aiida_relax_shape(structure, code_string, options, label):
    """Launch a relaxation of shape, but not volume."""

    Dict = DataFactory('dict')
    KpointsData = DataFactory('array.kpoints')

    # Set INCAR flags
    base_incar_dict = {
        'incar': {
            'PREC': 'Accurate',
            'EDIFF': 1e-8,
            'NELMIN': 5,
            'NELM': 100,
            'ENCUT': 500,
            'IALGO': 38,
            'ISMEAR': 0,
            'SIGMA': 0.01,
            'LREAL': False,
            'LCHARG': False,
            'LWAVE': False
        }
    }

    # Set code, potentials to use and options
    base_config = {
        'code_string': code_string,
        'potential_family': 'PBE',
        'potential_mapping': {
            'Si': 'Si',
            'C': 'C'
        },
        'options': options
    }

    # Make sure some things are parsed
    base_parser_settings = {'add_energies': True, 'add_forces': True, 'add_stress': True}

    # Set the provided code string
    code = Code.get_from_string(base_config['code_string'])

    # Which workchain to use
    workchain = WorkflowFactory('vasp.relax')

    # Construct the builder
    builder = workchain.get_builder()
    builder.code = code
    builder.parameters = Dict(dict=base_incar_dict)
    builder.structure = structure
    builder.settings = Dict(dict={'parser_settings': base_parser_settings})
    builder.potential_family = Str(base_config['potential_family'])
    builder.potential_mapping = Dict(dict=base_config['potential_mapping'])
    kpoints = KpointsData()
    kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])
    builder.kpoints = kpoints
    builder.options = Dict(dict=base_config['options'])
    builder.metadata.label = label
    builder.metadata.description = label
    builder.clean_workdir = Bool(False)
    relax = AttributeDict()
    relax.perform = Bool(True)
    relax.force_cutoff = Float(1e-5)
    relax.positions = Bool(True)
    relax.shape = Bool(True)
    relax.volume = Bool(False)
    builder.relax = relax
    builder.verbose = Bool(True)

    # Submit calc
    node = submit(builder)
    return node


def launch_aiida_full_relax(structure, code_string, options, label):
    """Launch a relaxation of everything."""
    Dict = DataFactory('dict')
    KpointsData = DataFactory('array.kpoints')

    # Set INCAR flags
    base_incar_dict = {
        'incar': {
            'PREC': 'Accurate',
            'EDIFF': 1e-8,
            'NELMIN': 5,
            'NELM': 100,
            'ENCUT': 500,
            'IALGO': 38,
            'ISMEAR': 0,
            'SIGMA': 0.01,
            'LREAL': False,
            'LCHARG': False,
            'LWAVE': False
        }
    }

    # Set code, potentials to use and options
    base_config = {
        'code_string': code_string,
        'potential_family': 'PBE.54',
        'potential_mapping': {
            'Si': 'Si',
            'C': 'C'
        },
        'options': options
    }

    # Make sure some things are parsed
    base_parser_settings = {'add_energies': True, 'add_forces': True, 'add_stress': True}

    # Set the provided code string
    code = Code.get_from_string(base_config['code_string'])

    # Which workchain to use
    workchain = WorkflowFactory('vasp.relax')

    # Construct the builder
    builder = workchain.get_builder()
    builder.code = code
    builder.parameters = Dict(dict=base_incar_dict)
    builder.structure = structure
    builder.settings = Dict(dict={'parser_settings': base_parser_settings})
    builder.potential_family = Str(base_config['potential_family'])
    builder.potential_mapping = Dict(dict=base_config['potential_mapping'])
    kpoints = KpointsData()
    kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])
    builder.kpoints = kpoints
    builder.options = Dict(dict=base_config['options'])
    builder.metadata.label = label
    builder.metadata.description = label
    builder.clean_workdir = Bool(False)
    relax = AttributeDict()
    relax.perform = Bool(True)
    relax.force_cutoff = Float(1e-5)
    relax.positions = Bool(True)
    relax.shape = Bool(True)
    relax.volume = Bool(True)
    relax.steps = Int(100)
    builder.relax = relax
    builder.verbose = Bool(True)

    # Submit the calc
    node = submit(builder)
    return node


def main(code_string, options, group_name, sleep_seconds=30):
    group = load_group(label=group_name)
    if not group.is_empty:
        # Make sure we do not already have members in the group
        print('The given group already have nodes. Please remove them or give a new group name.')
        exit(1)

    # Initial relaxation
    structure = get_structure_SiC()
    node_relax = launch_aiida_full_relax(structure, code_string, options, 'SiC VASP calc to relax volume')
    group.add_nodes(node_relax)
    print(f'Relaxing starting cell: {structure.cell}')

    while True:
        if node_relax.is_terminated:
            break
        print('Waiting for relaxation of starting cell to be done.')
        sleep(sleep_seconds)

    if node_relax.is_finished_ok:
        for strain, label in zip((0.99, 1.01), ('reduced', 'increased')):
            structure_clone = node_relax.outputs.relax.structure.clone()
            ase_structure = structure_clone.get_ase()
            # Make sure we also update atomic positions after scaling the cell
            ase_structure.set_cell(ase_structure.cell * strain**(1.0 / 3.0), scale_atoms=True)
            StructureData = DataFactory('structure')
            structure = StructureData(ase=ase_structure)
            node = launch_aiida_relax_shape(
                structure, code_string, options, f'SiC VASP relax shape at {label} volume ({strain:f})'
            )
            group.add_nodes(node)
            print(f'Relaxation positions of scaled cell: {ase_structure.cell}')
    else:
        print('Relaxation failed.')
        exit(1)

    # Now make sure we wait until all calcs have been terminated
    nodes_terminated = [True]
    while True:
        sleep(sleep_seconds)
        print('Waiting for all remaining relaxations to be done.')
        node_terminated = []
        for node in group.nodes:
            node_terminated.append(node.is_terminated)
        if all(node_terminated):
            break

    # Do one final check to make sure all calcs have a zero exit status
    for node in group.nodes:
        if not node.is_finished_ok:
            print(
                f'Node with id: {node.pk} has an exit status: {node.exit_status} and exit message: {node.exit_message}'
            )
            exit(1)


def calc_bulk_modulus(group_name):
    stresses = []
    volumes = []
    for label in ('reduced', 'increased'):
        qb = QueryBuilder()
        qb.append(Group, filters={'label': group_name}, tag='group')
        qb.append(WorkChainNode, with_group='group', filters={'label': {'ilike': '%' + label + '%'}})
        node = qb.first()[0]
        stresses.append(np.trace(node.outputs.stress.get_array('final')) / 3)
        volumes.append(np.linalg.det(node.inputs.structure.cell))

    d_s = stresses[1] - stresses[0]
    d_v = volumes[1] - volumes[0]
    v0 = (volumes[0] + volumes[1]) / 2.0
    bulk_modulus = -d_s / d_v * v0

    print(f'Bulk modulus: {bulk_modulus / 10.0} GPa')


if __name__ == '__main__':
    # Code_string is chosen from output of the list given by 'verdi code list'
    code_string = 'vasp@mycluster'

    # Set the options
    options = {
        'resources': {
            'num_machines': 1,
            'num_mpiprocs_per_machine': 8
        },
        'account': '',
        'qos': '',
        'max_memory_kb': 2000000,
        'max_wallclock_seconds': 1800
    }

    # Here we assume the group "Bulk_modulus_SiC_test" has already been created, but are empty
    group_name = 'Bulk_modulus_SiC_test'

    # Perform necessary calculations and put them in the created group
    main(code_string, options, group_name)

    # Calculate the bulk modulus by querying the group for the necessary input
    calc_bulk_modulus(group_name)

Notice a few things:

  1. We set builder.clean_workdir = Bool(False). This makes sure that the folder used on the defined computer (usually remotely on the cluster in most cases) for the VASP calculations are not removed after a successful calculation (which is usually the default). Then one can after a VaspCalculation is finished execute verdi calcjob gotocomputer <PK>, where <PK> is the id given to the VaspCalculation process you want to inspect. This takes you directly to the submission folder where the input and output files can be directly inspected. This setting is rather useful to debug and verify that your input and output files are as expected, regardless of the plugin.

  2. When we use submit the process node is returned when we call group.add_nodes() passing the relevant node. This does however not mean that the node represents a finalized and successful calculation. This process will continue. Hence, we this need to explicitly check that the calculations are finished, here in a separate step using the node attribute is_terminated and later also checking that the node does not contain any exit code, which means the calculation was not successful by checking the node attribute is_finished_ok. Only when all calculations are in a successful finished state can we proceed to calculate the bulk modulus.

  3. This bulk modulus script assumes the AiiDA group named Bulk_modulus_SiC_test already exists. This group is created by:

    $ verdi group create "Bulk_modulus_SiC_test"
    

    And we can use:

    $ verdi group list
    

    to inspect the present groups.

  4. We only rudimentary search the group to find the nodes which contains the results as outputs we need to calculate the bulk modulus. Hence, we can not run this script a second time as we would then pick up the old members of the group and print the bulk modulus of that. For some cases, this is good, meaning we reuse previous calculations so that we do not have to recalculate them, but for the sake of the example it can be a bit confusing. Hence, we check that the group Bulk_modulus_SiC_test is empty as the beginning of the main and stop if it is not. One can then, if for instance if rerunning the script, delete the group by first getting the id with verdi group list, locating the <PK> and then issuing verdi group delete <PK>, followed by a new verdi group create "Bulk_modulus_SiC_test". Then the script can be launched again.

Modify the run_bulk_modulus.py script to your setup. If in doubt, go back to previous tutorials. In essence, you need to look at the code defined, the potential family you have uploaded potentials into and the resources to be used where the code is installed. When that is done, execute the script:

$ python run_bulk_modulus.py
Relaxing starting cell: [[3.092, 0.0, 0.0], [-1.546, 2.6777505485015, 0.0], [0.0, 0.0, 5.073]]
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Waiting for relaxation of starting cell to be done.
Relaxation positions of scaled cell: Cell([[3.081742345228316, 0.0, 0.0], [-1.540871172614158, 2.668867162801478, 0.0], [0.0, 0.0, 5.056032550657371]])
Relaxation positions of scaled cell: Cell([[3.102356619252392, 0.0, 0.0], [-1.551178309626196, 2.686719647813093, 0.0], [0.0, 0.0, 5.0898531718508595]])
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Waiting for all remaining relaxations to be done.
Bulk modulus: 213.2516947999474 GPa

Depending on your setup it will take some minutes before the bulk modulus is printed, assuming your submitted calculations does not queue and start immediately. As we see, it is similar to the result we obtained using the manual method above. Obviously, if we need to redo this calculation, either using different input parameters and/or other input structures it would be little overhead involved and we could immediately benefit from it during our daily workflows. However, we can do better.

4. Writing bulk modulus workchain

In the previous section, the initial relaxation and the two volume restricted relaxations are performed independently. The resulting nodes are just grouped and we execute a calculation of the bulk modulus by fetching these nodes later. This means the workflow defined in the main method is not very transparent and we have to introduce while iterations and sleep the code for some time before continuing the iteration. This works, but is not very elegant. In fact, we would also like our workflow to follow the steps of the flow of work we defined at the top, also formally. In doing so, it makes it easier to both design new and understand existing workflows. Furthermore, it makes reuse easier due to the modular design of it all.

The question is thus; can we improve the somewhat large script introduced in the previous section? The answer is a definitive; yes. AiiDA has the concept of a Workchains which basically is a container for a workflow, or a component in a workflow. It is supposed to be the modular concept in AiiDA that gives us enough freedom to dissect and combine workchains into basic and complex workflows. It is apparent that we would like to write a workchain as modular and reusable as possible, such that ultimately several Workchains can be cherry picked into a bigger composition of some kind of master piece of a workchain to solve some given problem. And in doing so, keeping as close to the standard and expectations given in the domain so that others can rapidly relate to and understand what the workflow and its workchains are doing.

Note

The additional Workchains not bundled with AiiDA-VASP have to be exposed as a Python module to be seen from AiiDA daemon. It should be available in $PYTHONPATH of the daemon. The easiest way to achieve this is usually to make a small plugin that you can use as a container for the installation and documentation of the workchain, or set of workchain. This can then be installed (with pip for instance) in an AiiDA environment where you want to utilize the workchain(s).

So, let us now try to preserve the workflow described at the top more formally in code as well. The challenge will then be writing a suitable workchain that represents this workflow. It turns out that the migration from the run_bulk_modulus.py script above is rather straightforward. In the process we will also try to develop the workchain in a way which makes it easy to maintain, develop and share. In, addition to just transferring the run script developed above into a more robust workflow we will at the same time look into:

  1. Using the AiiDA plugin cookie cutter to prepare the structure of the plugin, which will be a very simple one in our case. The workchain what will define our workflow which we will call from a suitable run script.

    Note

    AiiDA is designed to be very modular and by making our single workflow for the bulk modulus is on its own a plugin,we make sure it can also be installed easily and correctly by many users. The overhead of doing this is highly worth it and sounds more daunting than it is in practice. You will also benefit from this.

  2. Writing documentation.

    Note

    Having at least basic documentation in place is very important, not only for others, but also for yourself. Everyone forget and you forget more quickly than you think. Also notice that it makes sense to be as explicit as possible when writing the workchain code. When doing this, it is much easier for new people to inspect and understand what the code actually does.

  3. Enabling possibilities to share the plugin with others due to easy install.

Okey, let us now start. We have already a pretty good mental picture on what we need to calculate and what to extract from the calculations. Buckle up, this is going to be a very compact introduction to how you can write plugins and workchains following good practices.

1. Generate the plugin structure and basics

We will now start by utilizing the AiiDA plugin cookie cutter. If you are not familiar with the cookie cutter principle, it is a way to quickly get baseline structure in place for modular code pieces or extensions, which is perfect for AiiDA plugins.

  1. Make sure you have your AiiDA Python environment enabled.

  2. Go to some location which will house the folder and Git repository of the bulk modulus plugin we will develop.

  3. Install cookiecutter:

    $ pip install cookiecutter
    
  4. Run the cookiecutter and answer the questions accordingly:

    $ cookiecutter https://github.com/aiidateam/aiida-plugin-cutter.git
    plugin_name [aiida-diff]: aiida-vasp-bm
    module_name [aiida_vasp_bm]:
    short_description [AiiDA demo plugin that wraps the `diff` executable for computing the difference between two files.]: A workflow for AiiDA-VASP that calculates the bulk modulus.
    entry_point_prefix [vasp_bm]:
    github_user [aiidateam]: <yourgithubusername>
    repo_name [aiida-vasp-bm]:
    contact_email []: <yourcontactemail>
    version [0.1.0a0]:
    author [The AiiDA Team]: The AiiDA-VASP developer team
    aiida_min_version [2.0]: 2.2
    year [2022]: 2023
    Running black on aiida-vasp-bm
    All done! ✨ 🍰 ✨
    14 files left unchanged.
    
    ### IMPORTANT###
    Register your plugin NOW by making a pull request to the AiiDA Plugin Registry at
    
    https://aiidateam.github.io/aiida-registry/
    

    where <yourgithubusername> is your username on GitHub, assuming you eventually want to publish the workflow there. Also, a contact email <yourcontactemail> should be provided.

    Enter the resulting aiida-vasp-bm folder and inspect it. The first thing you should do is to read the README.md file to know what the AiiDA plugin cookie cutter provides and how to further configure it. Out of the box, the following is provided:

    • A license file. MIT is provided out of the box.

    • The pyproject.toml file. Contains information about the plugin, dependencies etc. This also enabled us to install the plugin with e.g.

    • A basic example calculation (aiida_vasp_bm/calculations.py::DiffCalculation) and a basic parser (aiida_vasp_bm/parsers.py::DiffParser). In addition a few command line examples are given in aiida_vasp_bm/cli.py. None of this is relevant for us. We will instead reuse the VaspCalculation in the AiiDA-VASP plugin, the AiiDA-VASP parsers and its command line infrastructure.

    • Basic tests. Testing your code is important, so a testing framework is also provided to test the example calculation and cli in the tests folder.

    • Documentation. A sample using Sphinx that already can be compiled is found in the docs folder. In there, the web documentation can be built by running make html.

2. Adapting the generated plugin template

Adapting the cookiecutter output from the previous step to our specific involves both removing, modifying existing files and adding the file containing our workchain.

Since what we need is a very simple plugin, basically only containing the workchain housing our workflow, some light documentation etc. we will now remove what we do not need from the output of the cookiecutter command in the previous step. Please remove:

  • The tests folder in the root directory. We would generally advice against not prioritizing testing, but as stated above, testing workchains which use external codes is rather complicated and is not the topic of this tutorial. Also remove everything inside the examples folder.

  • The calculations.py, cli.py, parsers.py and helpers.py file and the data folder from the aiida_vasp_bm folder.

  • The conftest.py from the root directory.

  • Remove the testing section from the [project.optional-dependencies] section in pyproject.toml. Also remove the project.entry-points."aiida.data, project.entry-points."aiida.parsers, project.entry-points."aiida.calculations and project.entry-points."aiida.cmdline.data sections. Furthermore, since the AiiDA-VASP plugin depends on AiiDA we can remove that and add the proper version of AiiDA-VASP to the dependencies section so that it now reads:

    dependencies = [
      "voluptuous",
      aiida-vasp
    ]
    

    and the project.optional-dependencies so that it reads:

    [project.optional-dependencies]
    testing = [
      "aiida-vasp[tests]"
    ]
    pre-commit = [
      "aiida-vasp[pre-commit]"
    ]
    docs = [
      "aiida-vasp[pre-commit]"
    ]
    

    By changing this, we make sure the dependencies of this plugin follows what is given in the latest AiiDA-VASP version. We also add what is called an entry point for our workchain:

    [project.entry-points."aiida.workflows"]
    "vasp.vasp_bm" = "aiida_vasp_bm.workchains.bulkmodulus:BulkModulusWorkChain"
    

    which makes it possible later to use the WorkflowFactory function of AiiDA to load our workchain in the calling script using the vasp.bm key.

    Furthermore, make sure requires-python = ">=3.8" and remove 37 under the testenv:py entry, which is also something we require in AiiDA-VASP. The pyproject.toml should now look something like:

    [build-system]
    # build the package with [flit](https://flit.readthedocs.io)
    requires = ["flit_core >=3.4,<4"]
    build-backend = "flit_core.buildapi"
    
    [project]
    # See https://www.python.org/dev/peps/pep-0621/
    name = "aiida-vasp-bm"
    dynamic = ["version"]  # read from aiida_vasp_bm/__init__.py
    description = "A workflow for AiiDA-VASP that calculates the bulk modulus."
    authors = [{name = "The AiiDA-VASP developer team", email = "<youremailaddress>"}]
    readme = "README.md"
    license = {file = "LICENSE"}
    classifiers = [
        "Programming Language :: Python",
        "Intended Audience :: Science/Research",
        "License :: OSI Approved :: MIT License",
        "Natural Language :: English",
        "Development Status :: 3 - Alpha",
        "Framework :: AiiDA"
    ]
    keywords = ["aiida", "plugin"]
    requires-python = ">=3.8"
    dependencies = [
        "aiida-vasp",
        "voluptuous"
    ]
    
    [project.urls]
    Source = "https://github.com/<yourgithubusername>/aiida-vasp-bm"
    
    [project.entry-points."aiida.workflows"]
    "vasp.bm" = "aiida_vasp_bm.workchains.bulkmodulus:BulkModulusWorkChain"   
    
    [project.optional-dependencies]
    testing = [
        "aiida-vasp[tests]"
    ]
    pre-commit = [
        "aiida-vasp[pre-commit]"
    ]
    docs = [
        "aiida-vasp[pre-commit]"
    ]
    
    [tool.flit.module]
    name = "aiida_vasp_bm"
    
    [tool.pylint.format]
    max-line-length = 125
    
    [tool.pylint.messages_control]
    disable = [
        "too-many-ancestors",
        "invalid-name",
        "duplicate-code",
    ]
    
    [tool.pytest.ini_options]
    # Configuration for [pytest](https://docs.pytest.org)
    python_files = "test_*.py example_*.py"
    filterwarnings = [
        "ignore::DeprecationWarning:aiida:",
        "ignore:Creating AiiDA configuration folder:",
        "ignore::DeprecationWarning:plumpy:",
        "ignore::DeprecationWarning:yaml:",
    ]
    
    [tool.coverage.run]
    # Configuration of [coverage.py](https://coverage.readthedocs.io)
    # reporting which lines of your plugin are covered by tests
    source=["aiida_vasp_bm"]
    
    [tool.isort]
    # Configuration of [isort](https://isort.readthedocs.io)
    line_length = 120
    force_sort_within_sections = true
    sections = ['FUTURE', 'STDLIB', 'THIRDPARTY', 'AIIDA', 'FIRSTPARTY', 'LOCALFOLDER']
    known_aiida = ['aiida']
    
    [tool.tox]
    legacy_tox_ini = """
    [tox]
    envlist = py38
    
    [testenv]
    usedevelop=True
    
    [testenv:py{38,39,310}]
    description = Run the test suite against a python version
    extras = testing
    commands = pytest {posargs}
    
    [testenv:pre-commit]
    description = Run the pre-commit checks
    extras = pre-commit
    commands = pre-commit run {posargs}
    
    [testenv:docs]
    description = Build the documentation
    extras = docs
    commands = sphinx-build -nW --keep-going -b html {posargs} docs/source docs/build/html
    commands_post = echo "open file://{toxinidir}/docs/build/html/index.html"
    """
    

We now need to add the actual workchain to the plugin.

First get the workchain file by running:

$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp-bm/master/aiida_vasp_bm/workchains/bulkmodulus.py

and put this file in the aiida_vasp_bm folder. It should look something like:

import numpy as np
from aiida.orm import Bool
from aiida.plugins import DataFactory, WorkflowFactory
from aiida.engine import ToContext, WorkChain, calcfunction
from aiida.common.extendeddicts import AttributeDict

Dict = DataFactory('dict')
Float = DataFactory('float')
KpointsData = DataFactory("array.kpoints")

@calcfunction
def get_strained_structure(structure, strain):
    new_structure = structure.clone()
    new_structure.set_cell(
        np.array(new_structure.cell) * strain.value ** (1.0 / 3))
    return new_structure


@calcfunction
def calculate_bulk_modulus(stress_minus, stress_plus,
                           structure_minus, structure_plus):
    stresses = []
    volumes = []
    for stress in (stress_minus, stress_plus):
        stresses.append(np.trace(stress.get_array('final')) / 3)
    for structure in (structure_minus, structure_plus):
        volume = np.linalg.det(structure.cell)
        volumes.append(volume)
    d_s = stresses[1] - stresses[0]
    d_v = volumes[1] - volumes[0]
    v0 = (volumes[0] + volumes[1]) / 2
    bulk_modulus = - d_s / d_v * v0 / 10  # GPa
    return Float(bulk_modulus)


class BulkModulusWorkChain(WorkChain):
    """WorkChain to compute bulk modulus using VASP."""

    _next_workchain_string = 'vasp.relax'
    _next_workchain = WorkflowFactory(_next_workchain_string)

    @classmethod
    def define(cls, spec):
        super(BulkModulusWorkChain, cls).define(spec)
        spec.expose_inputs(cls._next_workchain)
        spec.outline(
            cls.initialize,
            cls.run_relax,
            cls.create_two_structures,
            cls.run_two_volumes,
            cls.calc_bulk_modulus,
        )
        spec.output('bulk_modulus', valid_type=Float)

    def initialize(self):
        self.report("initialize")
        self.ctx.inputs = AttributeDict()
        self.ctx.inputs.update(self.exposed_inputs(self._next_workchain))

    def run_relax(self):
        self.report("run_relax")
        Workflow = WorkflowFactory('vasp.relax')
        builder = Workflow.get_builder()
        for key in self.ctx.inputs:
            builder[key] = self.ctx.inputs[key]
        if 'label' in self.ctx.inputs.metadata:
            label = self.ctx.inputs.metadata['label'] + " relax"
            builder.metadata['label'] = label
        if 'description' in self.ctx.inputs.metadata:
            description = self.ctx.inputs.metadata['description'] + " relax"
            builder.metadata['description'] = description
        future = self.submit(builder)
        return ToContext(relax=future)

    def create_two_structures(self):
        assert self.ctx.relax.is_finished_ok
        self.report("create_two_structures")
        for strain, name in zip((0.99, 1.01), ('reduced', 'increased')):
            structure = get_strained_structure(
                self.ctx.relax.outputs.relax.structure, Float(strain))
            structure.label = name
            self.ctx['structure_%s' % name] = structure

    def run_two_volumes(self):
        self.report("run_two_volumes")
        for strain, future_name in zip((0.99, 1.01), ('reduced', 'increased')):
            Workflow = WorkflowFactory('vasp.relax')
            builder = Workflow.get_builder()
            for key in self.ctx.inputs:
                builder[key] = self.ctx.inputs[key]
            if 'label' in self.ctx.inputs.metadata:
                label = self.ctx.inputs.metadata['label'] + " " + future_name
                builder.metadata['label'] = label
            if 'description' in self.ctx.inputs.metadata:
                description = self.ctx.inputs.metadata['description']
                description += " " + future_name
                builder.metadata['description'] = description
            builder.structure = self.ctx['structure_%s' % future_name]
            relax = AttributeDict()
            relax.perform = Bool(True)
            relax.force_cutoff = Float(1e-8)
            relax.positions = Bool(True)
            relax.shape = Bool(True)
            relax.volume = Bool(False)
            relax.convergence_on = Bool(False)
            builder.relax = relax
            future = self.submit(builder)
            self.to_context(**{future_name: future})

    def calc_bulk_modulus(self):
        assert self.ctx.reduced.is_finished_ok
        assert self.ctx.increased.is_finished_ok
        self.report("calc_bulk_modulus")
        bulk_modulus = calculate_bulk_modulus(
            self.ctx.reduced.outputs.stress,
            self.ctx.increased.outputs.stress,
            self.ctx.reduced.inputs.structure,
            self.ctx.increased.inputs.structure)
        bulk_modulus.label = "Bulk modulus in GPa"
        self.out('bulk_modulus', bulk_modulus)
        self.report('finished bulk modulus workflow')

The workflow is the same as what we implemented in run_bulk_modulus.py, but here, we have increased the verbosity and tried to be more explicit to comply with a typical implementation of a workchain in AiiDA. Before continuing, you should read up on how you write workchains in AiiDA. The main functionality for our workchain, which will calculate the bulk modulus is housed in the class BulkModulusWorkChain, which inherits from the general WorkChain class of AiiDA. Consult the how you write workchains in AiiDA for more details of the structure of this, but summarized very briefly here: (i) we define what is going to be supplied as inputs, what is going to be done, or executed, and what is going to be an output in the define method of the BulkModulusWorkChain class. As you also noticed from the run_bulk_modulus.py script, we eventually called the RelaxWorkChain class of the AiiDA-VASP plugin multiple times. We also need to do this here. The spec.expose_inputs() configures the BulkModulusWorkChain to expect the same inputs as defined in another workchain, in this case, the workchain we will call later, the RelaxWorkChain. In addition, we specify an output that is required for this particular workchain, the bulk_modulus, which is required to be a simple float datatype.

Note

Again, remember that AiiDA have wrappers for the usual Python datatypes, including float, represented as Float. This is due to the fact that in AiiDA, the datatypes often need to be passed around and they need to obey data provenance principles. As such they are defined as nodes which gives additional functionalities over the basic data types, such as the ability to store, lock and link them. Also, as you have maybe already noticed from previous tutorials, AiiDA also supplies other dedicated data types. All of these can be used as input or output of workchains.

Finally, we define what is to be done in the outline section. This is sequential and refer to functions defined on this class further down in the file. It contains the following functions:

  1. initialize Initialize whatever needs initializing.

  2. run_relax Run a relaxation calculation to fully relax the initial crystal structure.

  3. create_two_structures Create two structures with a +/- 1% change in the volume from the relaxed structure obtained at the previous step.

  4. run_two_volumes Run two relaxation calculations to relax the shape and positions of the structures created at the previous step at fixed volume.

  5. calc_bulk_modulus Compute the bulk modulus using the formula \(K \simeq -V_0 \frac{\Delta P}{\Delta V}\), where the pressure \(P \equiv \mathrm{Tr}(\sigma)/3\) follows the VASP convention where \(\sigma\) is the stress tensor.

You can inspect the details of the BulkModulusWorkChain otherwise and inspect that it largely takes the same inputs and gives the same outputs as we did in the manual and the run_bulk_modulus.py way above. In the end of this section, we now would like to raise a few notes:

Note

Since the workchain definition is sequential, the calc_bulk_modulus function will not be executed until the previous steps are completed.

Note

Data created inside a workchain should be registered so that data provenance can be kept. This can for simple functions be performed using the calcfunction concept and decorator in AiiDA. For this purpose, the strained crystal structures are created in get_strained_structure and the bulk modulus is calculated in calculate_bulk_modulus, and its method is decorated by @calcfunction. These functions are called from the BulkModulusWorkChain when needed. The calcfunction concept is a very compact and slim way to pass tracked input and output to and from a function, thus honoring data provenance. For more complex functions, it makes sense to instead consider porting the function to a dedicated workchain.

Note

We have utilize ToContext and to_context, which ensures that we can make sure the workflow does not continue until what is set in these are done. Please consult the AiiDA documentation on the to context topic to learn more.

3. Creating the launch script

The launch script will define our input structure and launch the BulkModulusWorkChain workchain which will contain the bulk_modulus output node that we can inspect when it is finalized. Obtain the launch script:

$ wget https://raw.githubusercontent.com/aiida-vasp/aiida-vasp-bm/master/examples/run_vasp_bm.py

It should look something like:

import numpy as np
from aiida.common.extendeddicts import AttributeDict
from aiida.manage.configuration import load_profile
from aiida.orm import Bool, Str, Code, Int, Float, WorkChainNode, QueryBuilder, Group
from aiida.plugins import DataFactory, WorkflowFactory
from aiida.engine import submit
load_profile()

Dict = DataFactory('dict')
KpointsData = DataFactory("array.kpoints")

def launch_aiida_bulk_modulus(structure, code_string, options,
                              label="VASP bulk modulus calculation"):
    incar_dict = {
        'incar': {
            'PREC': 'Accurate',
            'EDIFF': 1e-8,
            'NELMIN': 5,
            'NELM': 100,
            'ENCUT': 500,
            'IALGO': 38,
            'ISMEAR': 0,
            'SIGMA': 0.01,
            'LREAL': False,
            'LCHARG': False,
            'LWAVE': False
        }
    }

    kpoints = KpointsData()
    kpoints.set_kpoints_mesh([6, 6, 4], offset=[0, 0, 0.5])

    potential_family = 'PBE.54'
    potential_mapping = {'Si': 'Si', 'C': 'C'}

    parser_settings = {'add_energies': True,
                       'add_forces': True,
                       'add_stress': True}

    code = Code.get_from_string(code_string)
    Workflow = WorkflowFactory('vasp.bm')
    builder = Workflow.get_builder()
    builder.code = code
    builder.parameters = Dict(dict=incar_dict)
    builder.structure = structure
    builder.settings = Dict(dict={'parser_settings': parser_settings})
    builder.potential_family = Str(potential_family)
    builder.potential_mapping = Dict(dict=potential_mapping)
    builder.kpoints = kpoints
    builder.options = Dict(dict=options)
    builder.metadata.label = label
    builder.metadata.description = label
    builder.clean_workdir = Bool(False)
    relax = AttributeDict()
    relax.perform = Bool(True)
    relax.force_cutoff = Float(1e-8)
    relax.steps = Int(100)
    relax.positions = Bool(True)
    relax.shape = Bool(True)
    relax.volume = Bool(True)
    builder.relax = relax
    builder.verbose = Bool(True)

    node = submit(builder)
    return node


def get_structure_SiC():
    """Set up SiC wurtzite cell

    wurtzite-type SiC
      1.0000000000
      3.0920000000   0.0000000000   0.0000000000
     -1.5460000000   2.6777505485   0.0000000000
      0.0000000000   0.0000000000   5.0730000000
    Si    C
        2     2
    Direct
      0.3333333333   0.6666666667   0.0000000000
      0.6666666667   0.3333333333   0.5000000000
      0.3333333333   0.6666666667   0.3758220000
      0.6666666667   0.3333333333   0.8758220000

    """

    StructureData = DataFactory('structure')
    a = 3.092
    c = 5.073
    lattice = [[a, 0, 0],
               [-a / 2, a / 2 * np.sqrt(3), 0],
               [0, 0, c]]
    structure = StructureData(cell=lattice)
    for pos_direct, symbol in zip(
            ([1. / 3, 2. / 3, 0],
             [2. / 3, 1. / 3, 0.5],
             [1. / 3, 2. / 3, 0.375822],
             [2. / 3, 1. / 3, 0.875822]), ('Si', 'Si', 'C', 'C')):
        pos_cartesian = np.dot(pos_direct, lattice)
        structure.append_atom(position=pos_cartesian, symbols=symbol)
    return structure

def main(code_string, options):
    structure = get_structure_SiC()
    node = launch_aiida_bulk_modulus(structure, code_string, options,
                                     label="SiC VASP bulk modulus calculation")
    print('Launched workchain node: ', node)

if __name__ == '__main__':
    # Code_string is chosen from output of the list given by 'verdi code list'
    code_string = 'vasp@mycluster'

    # Set the options
    options = {
        'resources': {
            'num_machines': 1,
            'num_mpiprocs_per_machine': 8
        },
        'account': '',
        'qos': '',
        'max_memory_kb': 2000000,
        'max_wallclock_seconds': 1800
    }

    # Run workflow
    main(code_string, options)

Place this call script in the examples directory of the plugin.

We also have to install the plugin we have created, such that the AiiDA daemon can pick it up. It is convenient to install it in what we call editable mode. Doing this, makes it possible to utilize the framework of installation and the fact that other packages like AiiDA becomes aware of our new plugin and workchain, while maintaining the convenience of being able to quickly modify the code and relaunch. Assuming you are not in the root folder of the aiida-vasp-bm plugin (the folder containing the aiida_vasp_bm and pyproject.toml file), execute:

$ pip install -e .

which will install the plugin in editable mode.

Warning

Please remember to restart the daemon if you change any code that the AiiDA daemon should have access to. This typically goes for all editable (the ones installed with the -e option for pip) installations of Python packages. This is very easy to miss and during heavy development it sometimes make sense to restart the daemon in the call script to not forget. Beware that if you forget this, it can cause a significant time drain on debugging something that already might be working just fine.

Again, tailor the launch script to your needs as we now have done a few times. Launch it. Note the id <PK> printed. Monitor its progress with e.g. verdi process report <PK>. When done, fetch its bulk_modulus output id <PK'> from verdi process show <PK'> and inspect it:

$ verdi node attributes <PK'>
PK: 6700
{
    "value": 213.25056684995
}

or check it in Python with e.g.:

$ verdi shell
In [1]: node = load_node(<PK>)

In [2]: node.outputs.bulk_modulus.value
Out[3]: 213.25056684995

Note

The basic data types, meaning, the ones that are not listed under verdi data. In this case, Float can be inspected with verdi node attributes from the command line. From Python the entry point for inspecting this is the same.

As we see, it yields the same bulk modulus as before. We have now made a new plugin and a workflow that we can easily share with others and perform bulk modulus in a consistent manner on many different input structures. We can also now include the BulkModulusWorkChain in other workflows if we need to calculate the bulk modulus as a sub component of that.

4. Sharing your workflow

In order to spread the uptake and also open for improvements and suggestions from the field, we suggest that you consider to make your workflow publicly accessible. There are mainly two ways:

  • Contribute to the AiiDA-VASP plugin such that your developed workflow eventually is part of the AiiDA-VASP distribution.

  • Publish it on GitHub or other standardized frameworks that supports publications of code using Git.

For the first option, please open an issue on our AiiDA-VASP repository and explain what you have done and then we take it from there. In order to show what you have done, showing the code is usually a good approach anyway, so the last option is still highly useful.

In order to upload your plugin to GitHub, please first make sure you have an account and that your GitHub account is reflected in the <yourgithubusername>> in the pyproject.toml. Also, make sure to update the author information in the README.md, pyproject.toml and the documentation folder docs. Please make sure everyone that should be credited is in fact credited. Then create a new project with the name <projectname> on GitHub and perform the following steps in the plugin root folder (here aiida-vasp-bm for this example, but it is rather unlikely you want to upload this particular plugin) folder:

$ git init
$ git add -A
$ git commit -m "Initial commit"
$ git branch -M main
$ git remote add origin git@github.com:<yourgithubusername>/<projectname>.git
$ git push -u origin main

Your repository, plugin and workflow should now, if you marked it with Public during the project creation, be available to everyone on https://github.com/<yourgithubusername>/aiida-vasp-bm (for this particular example).