3. Equilibrium volume of FCC Si, take 2

Here we will continue where the previous tutorial left of. In particular we will now create a workchain that execute the VASP workchain (basically, this executes VASP) for each volume and stores it. In the end we will even use some Python functions to locate the minimum and store it. We hope that you after this tutorial will finally start to appreciate the convenience of AiiDA-VASP. As before we assume you have completed the steps of the previous tutorial.

In order to make it as easy as possible, we have created a cookie cutter recipe for creating AiiDA-VASP workchains. This is located in cookiecutters/workchain on the root of the repository. If you have not cloned it, you can download only the cookiecutter folder using:

%/$ svn checkout https://github.com/aiida-vasp/aiida-vasp/branches/develop/cookiecutters

What we want to do is to create a workchain that calculates some equation of state, in this case over volume. For simplicity we will call the workchain Eos. It is always recommended to dissect the operations we will perform into parts before writing the workchain. In this way, it is possibly to possibly reuse larger parts of the workchain at a later point. Also, it is easier to understand what is going on from just inspecting the definitions of the inputs, steps and outputs of the workchain. Please, have a look at Workchains and references therein before continuing.

  1. Make sure you have cookiecutter installed:

    %/$ pip install cookiecutter
    
  2. Now we simply execute the cookiecutter using the workchains folder as the template:

    %/$ cookiecutter cookiecutters/workchains
    workchain_name [testchain]: eos
    workchain_docstring [This workchain is a testchain]: This workchain will accept a dictionary of structures and extract the total energies for each structure. The data is saved and the energy minimum is calculated and stored.
    next_workchain_to_call [vasp.vasp]:
    
  3. A new folder eos was created which contains the eos.py workchain file. Inspect it. Maybe the most important part of a workchain is the define function, which, as its name implies, defines the workchain. For this particular case it looks like:

        def define(cls, spec):
            super(EosWorkChain, cls).define(spec)
            spec.expose_inputs(cls._next_workchain, exclude=['structure'])
            spec.input_namespace('structures', valid_type=DataFactory('structure'), dynamic=True, help='a dictionary of structures to use')
            spec.exit_code(0, 'NO_ERROR', message='the sun is shining')
            spec.exit_code(420, 'ERROR_NO_CALLED_WORKCHAIN', message='no called workchain detected')
            spec.exit_code(500, 'ERROR_UNKNOWN', message='unknown error detected in the eos workchain')
    
            spec.outline(
                cls.initialize,
                while_(cls.run_next_workchains)(
                    cls.init_next_workchain,
                    cls.run_next_workchain,
                    cls.verify_next_workchain,
                    cls.extract_quantities
                ),
                cls.finalize
            )  # yapf: disable
    
            spec.output('quantities', valid_type=DataFactory('array'), help='a container for various quantities')
            spec.expose_outputs(cls._next_workchain)
    
    

    We have defined some inputs:

    • All inputs defined in the _next_workchain, which in this case is the VASP workchain. We can include those very easily by using expose_inputs. You can of course expose inputs from other workchains as long as AiiDA can find them.

    • A dictionary structures containing as many structures you want to run. Notice that the data type of each entry is StructureData.

    • Some generic exit codes.

    Then we have defined some outputs:

    • We attach all outputs from _next_workchain, i.e. the VASP workchain to this workchain (basically a link is created to avoid storing things twice). Similar to the inputs, expose_outputs can be leveraged.

    Finally, there is the outline which tells the workchain which functions to run. In this case we have:

    • A function that initializes what we need, initialize.

    • A while loop that iterates over all the structures.

    • A init_next_workchain function that initializes what we need for the next workchain at the current iteration (here we make sure the specific structure is set).

    • A run_next_workchain which basically executes VASP workchain.

    • A verify_next_workchain which verifies that there is a valid VASP workchain and inherits any exit codes present from VASP workchain.

    • A finalize which stores the results.

  4. The workchain is not completely ready. We need to extract the total energies and specify the general workchain further. Please make the following changes to the generated workchain:

    --- /home/docs/checkouts/readthedocs.org/user_builds/aiida-vasp/checkouts/latest/tutorials/eos_base.py
    +++ /home/docs/checkouts/readthedocs.org/user_builds/aiida-vasp/checkouts/latest/tutorials/eos.py
    @@ -10,6 +10,8 @@
     # pylint: disable=attribute-defined-outside-init
     import random
     import numpy as np
    +from scipy.optimize import minimize
    +from scipy.interpolate import interp1d
     
     from aiida.common.extendeddicts import AttributeDict
     from aiida.engine import calcfunction, WorkChain, while_, append_
    @@ -47,13 +49,13 @@
                     cls.init_next_workchain,
                     cls.run_next_workchain,
                     cls.verify_next_workchain,
    -                cls.extract_quantities
    +                cls.extract_volume_and_energy
                 ),
                 cls.finalize
             )  # yapf: disable
     
    -        spec.output('quantities', valid_type=DataFactory('array'), help='a container for various quantities')
    -        spec.expose_outputs(cls._next_workchain)
    +        spec.output('eos', valid_type=DataFactory('array'), help='a list containing the cell volumes and total energies')
    +        spec.output('eos_minimum', valid_type=DataFactory('dict'), help='a dictionary containing the cell volume at energy minimum')
     
         def initialize(self):
             """Initialize the eos workchain."""
    @@ -73,13 +75,13 @@
             self.ctx.is_finished = False
     
             # Define an interation index
    -        self.ctx.interation = 0
    +        self.ctx.iteration = 0
     
             # Define the context inputs
             self.ctx.inputs = AttributeDict()
     
    -        # Define container to store quantities that is extracted in each step
    -        self.ctx.quantities_container = []
    +        # Define the total energies list
    +        self.ctx.total_energies = []
     
         def _init_inputs(self):
             """Initialize inputs."""
    @@ -101,7 +103,7 @@
         def init_next_workchain(self):
             """Initialize the next workchain."""
     
    -        # Elevate iteration index
    +        # Elavate iteraetion index
             self.ctx.iteration += 1
     
             # Check that the context inputs exists
    @@ -156,42 +158,81 @@
                 self.report('The called {}<{}> returned a non-zero exit status. '
                             'The exit status {} is inherited'.format(workchain.__class__.__name__, workchain.pk, self.ctx.exit_code))
     
    -        # Stop further executions of workchains if there are no more structure
    +        # Stop further execution of workchains if there are no more structure
             # entries in the structures dictionary
             if not self.ctx.structures:
                 self.ctx.is_finished = True
     
             return self.ctx.exit_code
     
    -    def extract_quantities(self):
    -        """Extract the quantities you want and store them in the container."""
    -
    -        # What happens in this routine needs modifications by you.
    -        # Typically you get the output of the called workchain by doing
    -        # workchain = self.ctx.workchains[-1]
    -        # some_output_quantity = workchain.outputs.some_output_quantity
    -
    -        # An example which stores nonsense.
    -        self.ctx.quantities_container.append([self.ctx.iteration, 'Some quantity for iteration: {}'.format(self.ctx.iteration)])
    +    def extract_volume_and_energy(self):
    +        """Extract the cell volume and total energy for this structure."""
    +
    +        workchain = self.ctx.workchains[-1]
    +
    +        # Fetch the total energy
    +        misc = workchain.outputs.misc.get_dict()
    +        total_energy = misc['total_energies']['energy_extrapolated']
    +
    +        # Fetch the volume
    +        volume = self.ctx.inputs.structure.get_cell_volume()
    +
    +        # Store both in a list
    +        self.ctx.total_energies.append([volume, total_energy])
     
         def finalize(self):
             """
             Finalize the workchain.
     
    -        Take the quantity container and set it as an output of this workchain.
    +        Take the total energies container and set is as an output of this workchain.
             """
             # Due to data provenance we cannot return AiiDA data containers that have
             # not been passed through a calcfunction, workfunction or a workchain. Create this now.
    -        quantities_container = store_quantities(DataFactory('list')(list=self.ctx.quantities_container))
    +        total_energies = store_total_energies(DataFactory('list')(list=self.ctx.total_energies))
    +
    +        # Let us also try to find a better minimum, just as an example using the power of Python
    +        energy = locate_minimum(total_energies)
     
             # And then store the output on the workchain
    -        self.out('quantities', quantities_container)
    +        self.out('eos', total_energies)
    +        self.out('eos_minimum', energy)
     
     
     @calcfunction
    -def store_quantities(quantities_container):
    -    """Stores the quantities to keep data provenance."""
    -    quantities_container_list = quantities_container.get_list()
    -    quantities_container_array = DataFactory('array')()
    -    quantities_container_array.set_array('quantities', np.array(quantities_container_list))
    -    return quantities_container_array
    +def store_total_energies(total_energies):
    +    """Stores the total energies in ArrayData to keep data provenance."""
    +    total_energies_list = total_energies.get_list()
    +    # Let us also sort by volume as we picked the entries in the structures by random
    +    # above
    +    total_energies_array = np.array(total_energies_list)
    +    total_energies_array_sorted = total_energies_array[total_energies_array[:, 0].argsort()]
    +    array_data = DataFactory('array')()
    +    array_data.set_array('eos', total_energies_array_sorted)
    +
    +    return array_data
    +
    +
    +@calcfunction
    +def locate_minimum(total_energies):
    +    """Locate the volume with the lowest energy using interpolation."""
    +    total_energies_array = total_energies.get_array('eos')
    +    volumes = total_energies_array[:, 0]
    +    energies = total_energies_array[:, 1]
    +
    +    # Establish some initial guess (take lowest energy point from the original dataset)
    +    min_energy_guess_index = np.argmin(energies)
    +
    +    # Create the function that can be used to extract interpolated values
    +    # Using cubic interpolation here, which is not necessarly physically correct,
    +    # only serves as an example. Please have a look at the theory behind more realiastic
    +    # models that can be used to fit such data.
    +    new_energies = interp1d(volumes, energies, kind='cubic')
    +
    +    # Use minimize from scipy to locate the minimal point that can be ejected by the
    +    # interpolation routines
    +    minimized_point = minimize(new_energies, volumes[min_energy_guess_index], tol=1e-3)
    +
    +    # Create a dictionary to house the results and return
    +    dict_data = DataFactory('dict')(dict={'volume': minimized_point.x[0], 'energy': minimized_point.fun})
    +
    +    return dict_data
    

    and save it as eos.py. Or you could also download it:

    %/$ wget https://github.com/aiida-vasp/aiida-vasp/raw/develop/tutorials/eos.py
    

    The majority of changes were related to being more specific, except two things:

    • The necessity of decorating the function that generates the output array containing the volume and total energies in a calcfunction. This is to preserve data provenance, otherwise we would not have known how the data was collected from each of the underlying workchains.

    • The inclusion of a calcfunction that interpolates the calculated data to find and ever better estimate of the volume at the energy minima. The example uses a cubic fit, which is certainly not very physical and should not be used in production. It is only to show how simply it is to leverage the power of Python, NumPy and SciPy. Again, this was decorated with a calcfunction in order to make sure AiiDA can preserve the data provenance.

  5. Next, download the launch script that is tailored to launch the workchain we have now developed:

    %/$ wget https://github.com/aiida-vasp/aiida-vasp/raw/develop/tutorials/run_fcc_si_workchain.py
    
  6. Change the options and code_string as you did previously.

    Warning

    Make sure you have activated your AiiDA virtual environment and that the AiiDA daemon is running before continuing.

  7. Now we need to make sure the daemon can pick up the workchain. We can do this by making sure the daemon sees the directory where eos.py and run_fcc_si_workchain.py is located. The simplest approach is to add the following, to your virtual environment activate script (assuming you do not use Conda):

    $ echo "export PYTHONPATH=$PYTHONPATH:<yourdirectory>" >> ~/env/aiida-vasp/bin/activate
    

    assuming <yourdirectory> is the directory containing the eos.py and run_fcc_si_workchain.py files. The location of the activate is assumed from the previous steps in the tutorial. If you use Conda, please do:

    % echo "export PYTHONPATH=$PYTHONPATH:<yourdirectory>" >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
    
  8. Submit the workchain by running the call script:

    %/$ python run_fcc_si_workchain.py
    
  9. After a while we check the status:

    %/$ verdi process list -a
    PK  Created    Process label                 Process State      Process status
    ------  ---------  ----------------------------  -----------------  -----------------------------------------------------------
    103573  14m ago    EosWorkChain                  ⏹ Finished [0]
    103574  14m ago    VaspWorkChain                 ⏹ Finished [0]
    103575  14m ago    VaspCalculation               ⏹ Finished [0]
    103579  13m ago    VaspWorkChain                 ⏹ Finished [0]
    103580  13m ago    VaspCalculation               ⏹ Finished [0]
    103584  12m ago    VaspWorkChain                 ⏹ Finished [0]
    103585  12m ago    VaspCalculation               ⏹ Finished [0]
    103589  11m ago    VaspWorkChain                 ⏹ Finished [0]
    103590  11m ago    VaspCalculation               ⏹ Finished [0]
    103594  10m ago    VaspWorkChain                 ⏹ Finished [0]
    103595  10m ago    VaspCalculation               ⏹ Finished [0]
    103599  9m ago     VaspWorkChain                 ⏹ Finished [0]
    103600  9m ago     VaspCalculation               ⏹ Finished [0]
    103604  8m ago     VaspWorkChain                 ⏹ Finished [0]
    103605  8m ago     VaspCalculation               ⏹ Finished [0]
    103609  7m ago     VaspWorkChain                 ⏹ Finished [0]
    103610  7m ago     VaspCalculation               ⏹ Finished [0]
    103614  5m ago     VaspWorkChain                 ⏹ Finished [0]
    103615  5m ago     VaspCalculation               ⏹ Finished [0]
    103620  4m ago     store_total_energies          ⏹ Finished [0]
    103622  4m ago     locate_minimum                ⏹ Finished [0]
    

    As you can see, seven VASP calculation was performed, one for each supplied volume. Also, there is a separate entry for the storrage of the total energies, which also performs a sort. In this was, all generated results are trackable and we have preserved data provenance.

  10. Let us have a look at the output of EosWorkChain:

    %/$ verdi process show 103573
    Property       Value
    -------------  ------------------------------------
    type           WorkChainNode
    pk             103573
    uuid           1055b7ac-e02e-4510-9ed8-62177a9c2bd1
    label
    description
    ctime          2019-10-08 12:50:46.739221+00:00
    mtime          2019-10-08 13:00:47.530855+00:00
    process state  Finished
    exit status    0
    computer       [6] mycluster
    
    Inputs              PK      Type
    ------------------  ------  -------------
    structures
        silicon_at_4_3  103563  StructureData
        silicon_at_4_2  103562  StructureData
        silicon_at_4_1  103561  StructureData
        silicon_at_4_0  103560  StructureData
        silicon_at_3_9  103559  StructureData
        silicon_at_3_8  103558  StructureData
        silicon_at_3_7  103557  StructureData
        silicon_at_3_6  103556  StructureData
        silicon_at_3_5  103555  StructureData
    clean_workdir       103572  Bool
    code                101271  Code
    kpoints             103564  KpointsData
    max_iterations      103571  Int
    options             103568  Dict
    parameters          103565  Dict
    potential_family    103566  Str
    potential_mapping   103567  Dict
    settings            103569  Dict
    verbose             103570  Bool
    
    Outputs          PK  Type
    -----------  ------  ---------
    eos          103621  ArrayData
    eos_minimum  103623  Dict
    
    Called        PK  Type
    --------  ------  ----------------
    CALL      103622  CalcFunctionNode
    CALL      103620  CalcFunctionNode
    CALL      103614  WorkChainNode
    CALL      103609  WorkChainNode
    CALL      103604  WorkChainNode
    CALL      103599  WorkChainNode
    CALL      103594  WorkChainNode
    CALL      103589  WorkChainNode
    CALL      103584  WorkChainNode
    CALL      103579  WorkChainNode
    CALL      103574  WorkChainNode
    
    Log messages
    ---------------------------------------------
    There are 9 log messages for this calculation
    Run 'verdi process report 103573' to see them
    
  11. Inspect the total energies versus volume:

    %/$ verdi data array show 103621
    {
    "eos": [
       [
           10.71875,
           -4.42341934
       ],
       [
           11.664,
           -4.66006377
       ],
       [
           12.66325,
           -4.79595549
       ],
       [
           13.718,
           -4.86303425
       ],
       [
           14.82975,
           -4.87588342
       ],
       [
           16.0,
           -4.8481406
       ],
       [
           17.23025,
           -4.78451894
       ],
       [
           18.522,
           -4.69228806
       ],
       [
           19.87675,
           -4.58122037
       ]
     ]
    }
    
  12. And the located minimum:

    %/$ verdi data dict show 103623
    {
        "energy": -4.8769539175,
        "volume": 14.55935661641
    }
    

That concludes this tutorial. We hope at this point you have now realized that AiiDA-VASP seems somewhat usefull and that you would like to continue to learn more, maybe even start to write your own Workflows or Workchains.