Difference between revisions of "Quick start"

Revision as of 11:35, 30 June 2020

This is a quick start guide to fitting GAP potentials using an atomic database and some combination of descriptors. You can do this with QUIP/GAP alone, or add TurboGAP support for faster and more accurate many-body descriptors of the SOAP type. In addition, you can also use your GAP to run molecular dynamics simulations on LAMMPS.

Prerequisites

Required

You need either your own or a preexisting atomic database. If you are generating your own, you need (usually) a DFT package that can predict energy and forces for given atomic configurations. Examples of such packages are VASP, GPAW, Quantum Espresso, ABINIT, FHI AIMS, CASTEP, CP2K, etc.

You need a working QUIP and GAP installation. QUIP is for free, and GAP is for free for non-commercial research. Below are some quick instructions on how to get going with this.

Optional

Optional packages are not required but then make the expand on the ease/scope of making and using GAP potentials. Don't be lazy and install them (we will use them at discretation throughout the guide below).

The Atomic Simulation Environment (ASE) is not required but it is always a useful addition to to atomistic toolset, and particularly useful to generate the XYZ database files used by QUIP/GAP.

A TurboGAP installation if you want to use Miguel Caro's SOAP-type many-body descriptors, which are faster and more accurate than the default SOAP descriptors. In addition, TurboGAP supports SOAP compression (making it even faster). TurboGAP cannot (currently) be used on its own to training a GAP: it is used as a library in combination with QUIP/GAP. TurboGAP is free for non-commercial research.

A LAMMPS installation, if you want to use your GAP potentials to run large-scaling molecular dynamics simulations. A custom installation of LAMMPS with QUIP support is required (more details below).

Building a data base

The most essential requirement to train a ML potential, be it GAP or something else, is training data. For QUIP, a database of atomic configurations is an extended XYZ file with atomic structures (i.e., atomic coordinates), accompanied by energies, forces and/or virials. A single XYZ file, e.g., database.xyz, contains a concatenation of individual atomic structures. What kinds of configurations are contained here and how well they span the regions of configuration that are relevant for our potential will be absolutely critical in determining the accuracy of our trained GAP.

Good databases contain a combination of crystalline and distorted structures, dimer, trimers, surfaces, liquids, etc. Our "training set" may contain a lot of structures, of the order of thousands of simulation boxes and hundreds of thousands of unique atomic environments. Not all of these will be used for the kernel-ridge regression during production, only the sparse set will be used for that; however, the full training set will be used to find the optimal regression coefficients.

As an example, let us use GPAW, in combination with ASE, to compute energies and forces for a bunch of phosphorus structures. Note that this is just to get to grips with the workflow; you can generate the training data with another DFT code (although ASE is always handy for parsing the output of the calculation). E.g., if you use VASP, you can import the OUTCAR with ASE (atoms = read("OUTCAR")) and the energy, forces, etc., will be imported to the ASE Atoms() object.

But let us get back to generating a (rather crappy) database of P structures with GPAW. The code below will compute the energy for an isolated atom, assign it label "isolated_atom" and write it to file; then, for 20 configurations corresponding to the P₂ dissociation curve; then, 50 very stupidly constructed bulk configurations in periodic boundary conditions. Note that we are using an LCAO basis set since we want to do this fast, and we are not leaving a lot of vacuum around the non-periodic structure. If you hope to publish your database, you should fix this and other problems with this example.

import numpy as np
from ase import Atoms
from ase.io import read,write
from gpaw import GPAW


# Compute an isolated P atom
if 1:
    print("Doing isolated atom...")
    atoms = Atoms("P", positions=[[0,0,0]], pbc=False)
    atoms.center(vacuum=4.)
    calc = GPAW(xc="PBE", txt="gpaw.out", spinpol=True,
                mode="lcao", basis="dzp")
    atoms.set_calculator(calc)
    e = atoms.get_potential_energy(force_consistent=True)
    f = atoms.get_forces()
    atoms.info["config_type"] = "isolated_atom"
    write("database/isolated_atom.xyz", atoms)


# Do a dimer curve
if 1:
    list = np.arange(1., 5., 0.2)
    for i in range(0, len(list)):
        print("Doing %i/%i dimer configurations..." % (i, len(list)))
        d = list[i]
        atoms = Atoms("P2", positions = [[0,0,0], [d,0,0]], pbc=False)
        atoms.center(vacuum = 4.)
        calc = GPAW(xc="PBE", txt="gpaw.out", mode="lcao", basis="dzp")
        atoms.set_calculator(calc)
        e = atoms.get_potential_energy(force_consistent=True)
        f = atoms.get_forces()
        atoms.info["config_type"] = "dimer"
        write("database/dimer_%i.xyz" % i, atoms)


# Do some highly distorted bulk structures in cubic boxes
# From high to low density, 10 structures at each density
# with 8 atoms in each of them
if 1:
    n = 0
    nstruc = 10
    natoms = 8
    list = np.arange(3.5, 6., 0.5)
    ncalc = nstruc*len(list)
    for L in list:
        cell = [L, L, L]
        atoms = Atoms("P", positions=[[0,0,0]], cell = cell, pbc=True)
#       We add nstruc structures
        for i in range(0,nstruc):
#           natoms atoms to each structure
            while len(atoms) < natoms:
                j = len(atoms)
                atom_too_close = True
                while atom_too_close:
                    pos = L * np.random.sample(3)
                    atoms_temp = atoms + Atoms("P", positions=[pos])
                    atom_too_close = False
                    for k in range(0,j):
                        d = atoms_temp.get_distance(k, j, mic=True)
                        if d < 1.5:
                            atom_too_close = True
                            break
                atoms = atoms_temp.copy()
#           Do the calculation
            calc = GPAW(xc="PBE", txt="gpaw.out", mode="lcao", basis="dzp")
            atoms.set_calculator(calc)
            print("Doing %i/%i random configurations..." % (n, ncalc))
            e = atoms.get_potential_energy(force_consistent=True)
            f = atoms.get_forces()
            atoms.info["config_type"] = "random"
            write("database/random_%i.xyz" % n, atoms)
            n += 1

@@ Line 31: / Line 31: @@
 But let us get back to generating a (rather crappy) database of P structures with GPAW. The code below will compute the energy for an isolated atom, assign it label "isolated_atom" and write it to file; then, for 20 configurations corresponding to the P<sub>2</sub> dissociation curve; then, 50 very stupidly constructed bulk configurations in periodic boundary conditions. Note that we are using an LCAO basis set since we want to do this fast, and we are not leaving a lot of vacuum around the non-periodic structure. If you hope to publish your database, you should fix this and other problems with this example.
-<source lang="python">
+<syntaxhighlight lang="python">
 import numpy as np
 from ase import Atoms
@@ Line 105: / Line 105: @@
              write("database/random_%i.xyz" % n, atoms)
              n += 1
-</source>
+</syntaxhighlight>
 == Installing TurboGAP (optional step) ==

Difference between revisions of "Quick start"

Revision as of 11:35, 30 June 2020

Contents

Prerequisites

Required

Optional

Building a data base

Installing TurboGAP (optional step)

Installing QUIP/GAP

Fitting a GAP potential

Testing your potential with QUIP

Installing LAMMPS (optional step)

Running molecular dynamics with LAMMPS (optional step)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools