Difference between revisions of "Quick start"

From TurboGAP
Jump to navigation Jump to search
Line 117: Line 117:
 
== Installing TurboGAP (optional step) ==
 
== Installing TurboGAP (optional step) ==
  
 +
If you want to be able to use '''TurboGAP''' functionality, e.g., <code>soap_turbo</code> descriptors (which are faster and more accurate than regular <code>soap</code>) you need to build the '''TurboGAP''' library to be used in combination with '''QUIP/GAP'''. Note that you need to have Blas/Lapack libraries locally installed.
  
 +
To proceed with the installation, clone the '''TurboGAP''' repo:
 +
 +
<syntaxhighlight lang="shell" line="line" style="">
 +
# Create and cd into a directory
 +
mkdir /myapps/turbogap
 +
cd /myapps/turbogap
 +
 +
# Clone the turbogap development repo
 +
git clone https://github.com/mcaroba/turbogap.git turbogap_dev
 +
 +
# cd and make the code (you may want to edit the Makefile, load compiler modules, etc., depending on the specifics of your system)
 +
cd turbogap_dev
 +
make
 +
</syntaxhighlight>
 +
 +
This should have built the '''TurboGAP''' binaries and libraries.
  
 
== Installing QUIP/GAP ==
 
== Installing QUIP/GAP ==

Revision as of 08:05, 1 July 2020

This is a quick start guide to fitting GAP potentials using an atomic database and some combination of descriptors. You can do this with QUIP/GAP alone, or add TurboGAP support for faster and more accurate many-body descriptors of the SOAP type. In addition, you can also use your GAP to run molecular dynamics simulations on LAMMPS.

Prerequisites

Required

  • You need either your own or a preexisting atomic database. If you are generating your own, you need (usually) a DFT package that can predict energy and forces for given atomic configurations. Examples of such packages are VASP, GPAW, Quantum Espresso, ABINIT, FHI AIMS, CASTEP, CP2K, etc.
  • You need a working QUIP and GAP installation. QUIP is for free, and GAP is for free for non-commercial research. Below are some quick instructions on how to get going with this.

Optional

Optional packages are not required but then make the expand on the ease/scope of making and using GAP potentials. Don't be lazy and install them (we will use them at discretation throughout the guide below).

  • The Atomic Simulation Environment (ASE) is not required but it is always a useful addition to to atomistic toolset, and particularly useful to generate the XYZ database files used by QUIP/GAP.
  • A TurboGAP installation if you want to use Miguel Caro's SOAP-type many-body descriptors, which are faster and more accurate than the default SOAP descriptors. In addition, TurboGAP supports SOAP compression (making it even faster). TurboGAP cannot (currently) be used on its own to training a GAP: it is used as a library in combination with QUIP/GAP. TurboGAP is free for non-commercial research.
  • A LAMMPS installation, if you want to use your GAP potentials to run large-scaling molecular dynamics simulations. A custom installation of LAMMPS with QUIP support is required (more details below).


Building a data base

The most essential requirement to train a ML potential, be it GAP or something else, is training data. For QUIP, a database of atomic configurations is an extended XYZ file with atomic structures (i.e., atomic coordinates), accompanied by energies, forces and/or virials. A single XYZ file, e.g., database.xyz, contains a concatenation of individual atomic structures. What kinds of configurations are contained here and how well they span the regions of configuration that are relevant for our potential will be absolutely critical in determining the accuracy of our trained GAP.

Good databases contain a combination of crystalline and distorted structures, dimer, trimers, surfaces, liquids, etc. Our "training set" may contain a lot of structures, of the order of thousands of simulation boxes and hundreds of thousands of unique atomic environments. Not all of these will be used for the kernel-ridge regression during production, only the sparse set will be used for that; however, the full training set will be used to find the optimal regression coefficients.

As an example, let us use GPAW, in combination with ASE, to compute energies and forces for a bunch of phosphorus structures. Note that this is just to get to grips with the workflow; you can generate the training data with another DFT code (although ASE is always handy for parsing the output of the calculation). E.g., if you use VASP, you can import the OUTCAR with ASE (atoms = read("OUTCAR")) and the energy, forces, etc., will be imported to the ASE Atoms() object.

But let us get back to generating a (rather crappy) database of P structures with GPAW. The code below will compute the energy for an isolated atom, assign it label "isolated_atom" and write it to file; then, for 20 configurations corresponding to the P2 dissociation curve; then, 50 very stupidly constructed bulk configurations in periodic boundary conditions. Note that we are using an LCAO basis set since we want to do this fast, and we are not leaving a lot of vacuum around the non-periodic structure. If you hope to publish your database, you should fix this and other problems with this example.

 1 import numpy as np
 2 from ase import Atoms
 3 from ase.io import read,write
 4 from gpaw import GPAW
 5 
 6 
 7 # Compute an isolated P atom
 8 if 1:
 9     print("Doing isolated atom...")
10     atoms = Atoms("P", positions=[[0,0,0]], pbc=False)
11     atoms.center(vacuum=4.)
12     calc = GPAW(xc="PBE", txt="gpaw.out", spinpol=True,
13                 mode="lcao", basis="dzp")
14     atoms.set_calculator(calc)
15     e = atoms.get_potential_energy(force_consistent=True)
16     f = atoms.get_forces()
17     atoms.info["config_type"] = "isolated_atom"
18     write("database/isolated_atom.xyz", atoms)
19 
20 
21 # Do a dimer curve
22 if 1:
23     list = np.arange(1., 5., 0.2)
24     for i in range(0, len(list)):
25         print("Doing %i/%i dimer configurations..." % (i, len(list)))
26         d = list[i]
27         atoms = Atoms("P2", positions = [[0,0,0], [d,0,0]], pbc=False)
28         atoms.center(vacuum = 4.)
29         calc = GPAW(xc="PBE", txt="gpaw.out", mode="lcao", basis="dzp")
30         atoms.set_calculator(calc)
31         e = atoms.get_potential_energy(force_consistent=True)
32         f = atoms.get_forces()
33         atoms.info["config_type"] = "dimer"
34         write("database/dimer_%i.xyz" % i, atoms)
35 
36 
37 # Do some highly distorted bulk structures in cubic boxes
38 # From high to low density, 10 structures at each density
39 # with 8 atoms in each of them
40 if 1:
41     n = 0
42     nstruc = 10
43     natoms = 8
44     list = np.arange(3.5, 6., 0.5)
45     ncalc = nstruc*len(list)
46     for L in list:
47         cell = [L, L, L]
48         atoms = Atoms("P", positions=[[0,0,0]], cell = cell, pbc=True)
49 #       We add nstruc structures
50         for i in range(0,nstruc):
51 #           natoms atoms to each structure
52             while len(atoms) < natoms:
53                 j = len(atoms)
54                 atom_too_close = True
55                 while atom_too_close:
56                     pos = L * np.random.sample(3)
57                     atoms_temp = atoms + Atoms("P", positions=[pos])
58                     atom_too_close = False
59                     for k in range(0,j):
60                         d = atoms_temp.get_distance(k, j, mic=True)
61                         if d < 1.5:
62                             atom_too_close = True
63                             break
64                 atoms = atoms_temp.copy()
65 #           Do the calculation
66             calc = GPAW(xc="PBE", txt="gpaw.out", mode="lcao", basis="dzp")
67             atoms.set_calculator(calc)
68             print("Doing %i/%i random configurations..." % (n, ncalc))
69             e = atoms.get_potential_energy(force_consistent=True)
70             f = atoms.get_forces()
71             atoms.info["config_type"] = "random"
72             write("database/random_%i.xyz" % n, atoms)
73             n += 1

We have given a label ("config_type") to structures generated in different ways because this will allow us to provide per-configuration regularization parameters, a nice feature and very useful of QUIP/GAP. Importantly, you should put all of the individual XYZ files into a single "database" XYZ file, which is easily achieved by concatenating all of them. For example, given the files generated above, you could do (adjusting the ranges as needed):

1 cat database/isolated_atom.xyz > database.xyz
2 for i in $(seq 0 1 19); do cat database/dimer_${i}.xyz >> database.xyz; done
3 for i in $(seq 0 1 49); do cat database/random_${i}.xyz >> database.xyz ; done

Installing TurboGAP (optional step)

If you want to be able to use TurboGAP functionality, e.g., soap_turbo descriptors (which are faster and more accurate than regular soap) you need to build the TurboGAP library to be used in combination with QUIP/GAP. Note that you need to have Blas/Lapack libraries locally installed.

To proceed with the installation, clone the TurboGAP repo:

 1 # Create and cd into a directory
 2 mkdir /myapps/turbogap
 3 cd /myapps/turbogap
 4 
 5 # Clone the turbogap development repo
 6 git clone https://github.com/mcaroba/turbogap.git turbogap_dev
 7 
 8 # cd and make the code (you may want to edit the Makefile, load compiler modules, etc., depending on the specifics of your system)
 9 cd turbogap_dev
10 make

This should have built the TurboGAP binaries and libraries.

Installing QUIP/GAP

Fitting a GAP potential

Testing your potential with QUIP

Installing LAMMPS (optional step)

Running molecular dynamics with LAMMPS (optional step)