Quick start
This is a quick start guide to fitting GAP potentials using an atomic database and some combination of descriptors. You can do this with QUIP/GAP alone, or add TurboGAP support for faster and more accurate many-body descriptors of the SOAP type. In addition, you can also use your GAP to run molecular dynamics simulations on LAMMPS.
Contents
Prerequisites
Required
- You need either your own or a preexisting atomic database. If you are generating your own, you need (usually) a DFT package that can predict energy and forces for given atomic configurations. Examples of such packages are VASP, GPAW, Quantum Espresso, ABINIT, FHI AIMS, CASTEP, CP2K, etc.
- You need a working QUIP and GAP installation. QUIP is for free, and GAP is for free for non-commercial research. Below are some quick instructions on how to get going with this.
Optional
Optional packages are not required but then make the expand on the ease/scope of making and using GAP potentials. Don't be lazy and install them (we will use them at discretation throughout the guide below).
- The Atomic Simulation Environment (ASE) is not required but it is always a useful addition to to atomistic toolset, and particularly useful to generate the XYZ database files used by QUIP/GAP.
- A TurboGAP installation if you want to use Miguel Caro's SOAP-type many-body descriptors, which are faster and more accurate than the default SOAP descriptors. In addition, TurboGAP supports SOAP compression (making it even faster). TurboGAP cannot (currently) be used on its own to training a GAP: it is used as a library in combination with QUIP/GAP. TurboGAP is free for non-commercial research.
- A LAMMPS installation, if you want to use your GAP potentials to run large-scaling molecular dynamics simulations. A custom installation of LAMMPS with QUIP support is required (more details below).
Building a data base
The most essential requirement to train a ML potential, be it GAP or something else, is training data. For QUIP, a database of atomic configurations is an extended XYZ file with atomic structures (i.e., atomic coordinates), accompanied by energies, forces and/or virials. A single XYZ file, e.g., database.xyz, contains a concatenation of individual atomic structures. What kinds of configurations are contained here and how well they span the regions of configuration that are relevant for our potential will be absolutely critical in determining the accuracy of our trained GAP.
Good databases contain a combination of crystalline and distorted structures, dimer, trimers, surfaces, liquids, etc. Our "training set" may contain a lot of structures, of the order of thousands of simulation boxes and hundreds of thousands of unique atomic environments. Not all of these will be used for the kernel-ridge regression during production, only the sparse set will be used for that; however, the full training set will be used to find the optimal regression coefficients.
As an example, let us use GPAW, in combination with ASE, to compute energies, forces and the virial tensor for a bunch of phosphorus structures.