6. Where is my data?¶

So far, we have followed tutorials, step by step and everything might seem very clear. However, after doing your first VASP calculation that is not in a tutorial form, you might have a few questions. One of them is typically, where is my data and how can I interact with it? In this tutorial we will try to shed some light on this and how you can approach this from AiiDA and the

As you might have noticed we have now populated the database with the silicon structure multiple times. Let us try instead to load some of the structures that already are present to save coding, reuse previous results and save data storage.

In order to illustrate this we will need a finished VaspWorkChain and we will use the one from the previous tutorial. You can chose any VaspWorkChain you want from your side, but notice that the input and outputs might be slightly different. If you are confused, please complete previous tutorial and continue to use the results from that as we do here.

One of the first questions new plugin, but experienced VASP users ask is, where can I find my original files? Notice first, that the default configuration of the plugin removes the downloaded files after successful parsing unless it is explicitly told not to do so, or to keep specific files. Let us first investigate how we can tailor this. It is rather simple. First, we always try to retrieve what is stored in the _ALWAYS_RETRIEVE_LIST in calcs/vasp.py, which currently is the CONTCAR, OUTCAR, vasprun.xml, vasp_output , any wannier90* file. The default setting settings['ALWAYS_STORE'] = True make sure these files will always be kept after a successful parsing step, which also results in a successful VaspCalculation. One can specify settings['ALWAYS_STORE'] = False to delete these files after a successful parsing step if one only want to keep the parsed data, which are typically stored as AiiDA data nodes on the output of the process node. In addition, it is possible to fine tune this. For instance, one might want to make sure to also download additional files, like the CHGCAR. This can be done by specifying settings['ADDITIONAL_RETRIEVE_LIST'] = ['CHGCAR'] or settings['ADDITIONAL_RETRIEVE_TEMPORARY_LIST'] = ['CHGCAR'], where the additional entries in the former (latter) will be stored (deleted) by default after parsing, regardless of the values of ALWAYS_STORE.

With this in mind, let us interact with data and see how this is manifested for our chosen example.

Let us first inspect the outputs of our previous VaspWorkChain:

$ verdi process show 2431
Property     Value
-----------  ------------------------------------
type         VaspWorkChain
state        Finished [0]
pk           2431
uuid         40ce7bd6-cd38-405e-951e-c56251a0cf1b
label
description
ctime        2022-12-22 11:17:25.967623+01:00
mtime        2022-12-22 11:19:39.816274+01:00

Inputs               PK  Type
-----------------  ----  -------------
clean_workdir      2430  Bool
code                818  InstalledCode
kpoints            2422  KpointsData
max_iterations     2429  Int
options            2426  Dict
parameters         2423  Dict
potential_family   2424  Str
potential_mapping  2425  Dict
settings           2427  Dict
structure          1529  StructureData
verbose            2428  Bool

Outputs          PK  Type
-------------  ----  ----------
dos            2436  ArrayData
misc           2437  Dict
remote_folder  2434  RemoteData
retrieved      2435  FolderData

Called          PK  Type
------------  ----  ---------------
iteration_01  2433  VaspCalculation

Log messages
---------------------------------------------
There are 3 log messages for this calculation
Run 'verdi process report 2431' to see them

Pay particular notice to the outputs, especially the retrieved, which is of a FolderData type. Let us now inspect it.

We can use the verdi node repo command. Let us first check what is in the folder:
```
$ verdi node repo ls 2435
CONTCAR
DOSCAR
EIGENVAL
OUTCAR
_scheduler-stderr.txt
_scheduler-stdout.txt
vasp_output
vasprun.xml
```
As we can see, this is the default files listed in _ALWAYS_RETRIEVE_LIST. In addition, there are the scheduler standard stream files, which is added by AiiDA.

Let us have a look at the content of for instance CONTCAR:

$ verdi node repo cat 2435 CONTCAR
# Compound: Si. Old comment: silicon_at_
   1.0000000000000000
     1.9500000000000000    1.9500000000000000    0.0000000000000000
     0.0000000000000000    1.9500000000000000    1.9500000000000000
     1.9500000000000000    0.0000000000000000    1.9500000000000000
   Si
     1
Direct
  0.0000000000000000  0.0000000000000000  0.0000000000000000

  0.00000000E+00  0.00000000E+00  0.00000000E+00

If you want, this can be piped to a file and displayed using regular tools:

$ verdi node repo cat 2435 CONTCAR > /tmp/contcar
$ more /tmp/contcar
# Compound: Si. Old comment: silicon_at_
   1.0000000000000000
     1.9500000000000000    1.9500000000000000    0.0000000000000000
     0.0000000000000000    1.9500000000000000    1.9500000000000000
     1.9500000000000000    0.0000000000000000    1.9500000000000000
   Si
     1
Direct
  0.0000000000000000  0.0000000000000000  0.0000000000000000

  0.00000000E+00  0.00000000E+00  0.00000000E+00

So getting to your files requires a bit more typing than what seems comparable to working with folders and files in the traditional way, but this is only relevant for simple one off examples. Once, the workflow becomes more involved and the nesting of folders much more complicated, the typing involved quickly becomes more compact using AiiDA, but of course, the main benefits is in everything that comes along with it.

Inspecting data, or working with it in general programmatic way is also very easy using the verdi shell, which gives you access to an IPython instance where most of the needed AiiDA functionality is loaded for you. Launch the verdi shell:
```
$ verdi shell
```
Then we load the node:
```
In [1]: node = load_node(2435)
```
And inspect the objects residing in the retrieved folder:
```
In [2]: node.base.repository.list_object_names()
Out[2]:
['CONTCAR',
'OUTCAR',
'_scheduler-stderr.txt',
'_scheduler-stdout.txt',
'vasp_output',
'vasprun.xml']
```
As we can see, as before, this is the default files listed in _ALWAYS_RETRIEVE_LIST, in addition to the scheduler files.

Note

For most commands, tab completion is available so you can write node. and then tab complete it to check what methods (with parenthesis) or attributes (no parenthesis) are available on the node. Notice however, that most of the useful methods and attributes are now placed into sub-namespaces under base, see documentation on namespace change for more details.

We can now inspect the content of these files:
```
In [3]: node.base.repository.get_object_content('CONTCAR')
Out[3]: '# Compound: Si. Old comment: silicon_at_\n   1.0000000000000000     \n     1.9500000000000000    1.9500000000000000    0.0000000000000000\n     0.0000000000000000    1.9500000000000000    1.9500000000000000\n     1.9500000000000000    0.0000000000000000    1.9500000000000000\n   Si\n     1\nDirect\n  0.0000000000000000  0.0000000000000000  0.0000000000000000\n\n  0.00000000E+00  0.00000000E+00  0.00000000E+00\n'
```
And the content is available as a string. We can also of course dump this to a file:
```
In [4]: with open('/tmp/contcar', 'w') as fo:
...:     fo.write(node.base.repository.get_object_content('CONTCAR'))
...:
```
exit the verdi shell by typing exit and issue:
```
$ more /tmp/contcar
```
and there you again see the CONTCAR from the VASP calculation.

6. Where is my data?¶

Previous topic

Next topic

This Page