What is TurboGAP ?

TurboGAP is a Fortran atomistic code which uses Gaussian Approximation Potentials (GAPs).
Its specialty are SOAP turbo descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
It's (probably!) the fastest SOAP prediction code, which will be made even faster with GPU acceleration (ongoing with CSC).

Through GAP/SOAP predictions, one can perform simulations with ab-initio accuracy at scale
- Molecular Dynamics
- Grand-Canonical Monte-Carlo simulations
- Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
- Prediction/optimization of experimental observables (e.g. X-Ray photoelectron spectra (Golze et al. 2022; Zarrouk et al. 2024), X-ray/Neutron Diffraction and more to come!).

Motivation for structural inference

We typically want to predict structures, to make sense of experimental data.
We usually pray structures have experimental agreement (a “bottom-up” approach).
What if we can promote agreement with experimental data?
Reverse Monte-Carlo techniques match experimental data to atomic structure (“top-down” approach).
- Downsides: “By-hand” constraints
- Simple experimental observables.
- Unphysical structures/artifacts (Opletal et al. 2017)

Need efficient structure search to find agreement with experimental data and ab-inito data.

With CO GAP + Core electron binding energy SOAP model we will perform modified (Grand-canonical) Monte-Carlo to produce experimentally consistent, low energy structures of oxygenated amorphous carbon.

We will do simulations similar to the above, which is from our JACS paper (Zarrouk et al. 2024).

Theory and Methods

Gaussian Approximation Potentials (GAP)

(Bartók et al. 2010; Bartók, Kondor, and Csányi 2013; Deringer et al. 2021)

Assumption: Total energy is sum of local energies. [math]\displaystyle{ E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i)) }[/math]

[math]\displaystyle{ \begin{align} \varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) = E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\ & + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i) + \sum_{r \lt r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber \end{align} }[/math]

Local energy: Sum of 2b, 3b and many-body terms.
Descriptor [math]\displaystyle{ \xi / \boldsymbol{\xi} }[/math].: e.g [math]\displaystyle{ \{\xi^{\text{2b}}\} = \{r_{ij}\} }[/math], [math]\displaystyle{ \{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \} }[/math]
Local environment: A set of descriptors.

Using descriptors to predict quantities

Kernel/basis functions [math]\displaystyle{ k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j) }[/math] measure the similarity between descriptors.
e.g Exponential 2b kernel: [math]\displaystyle{ k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 ) }[/math]
Gaussian Process Regression (kernel matrix algebra on training set) → fitting coefficients [math]\displaystyle{ \{\alpha\} }[/math], (Deringer et al. 2021)
e.g. 2b model trained with [math]\displaystyle{ N_s }[/math] descriptors [math]\displaystyle{ \{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\} }[/math], the energy for atom with [math]\displaystyle{ M }[/math] pairwise distances in local environment: [math]\displaystyle{ E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) }[/math]

SOAP (many-body) descriptor

SOAP descriptor: [math]\displaystyle{ \boldsymbol{\xi}_i(\{\mathbf{r}\}_{ \lt r_{\rm cut}} ) }[/math], represents environment of an atom (Bartók et al. 2010).
Kernel: [math]\displaystyle{ k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta} }[/math] measures similarity of environments.
Fitting coefficients: [math]\displaystyle{ \alpha_j }[/math] result from Gaussian Process Regression.
Prediction: [math]\displaystyle{ E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij} }[/math]
For specifics of SOAP turbo, refer to Miguel's paper (Caro 2019)

How do we model XPS?

XPS (X-ray photoelectron spectroscopy) spectra measure core electron binding energies.
Core electron binding energies depend on atomic environment.
SOAP turbo model trained on DFT ([math]\displaystyle{ \Delta\text{KS} }[/math]) data with a [math]\displaystyle{ GW }[/math] correction on top (Golze et al. 2022).
[math]\displaystyle{ \text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} ) }[/math]
Thermal and instrumental broadening accounted for by a Gaussian with [math]\displaystyle{ \sigma = 0.4\text{ eV} }[/math]
Can analyse environments which make up spectra.

Prediction of Local Properties

We can predict an arbitrary number of local properties which use SOAP turbo descriptors using TurboGAP.
/One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
Hirshfeld volumes

gives vdW dispersion via Tkatchenko-Scheffler. [math]\displaystyle{ E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} }[/math]

Core-electron binding energies

give X-ray Photoelectron Spectroscopy prediction. [math]\displaystyle{ g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) }[/math]

How do we marry experimental data and GAP?

Generalized Hamiltonian within Grand-Canonical Monte-Carlo scheme.
Vary oxygen content with chemical potential, to see motif differences.

[math]\displaystyle{ \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left( -\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm {pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right) }[/math]

[math]\displaystyle{ \tilde{E} = E_{\rm GAP} + E_{\rm spectra} }[/math]

[math]\displaystyle{ E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2 }[/math]

A simple squared difference, which is a penalty with increases with spectra dissimilarity and [math]\displaystyle{ \gamma }[/math] is an energy scale.
We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

We use the standard acceptance criteria for adding/removing/moving a particle

[math]\displaystyle{ \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1, \frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) - \tilde{E}(N) - \mu}{k_BT} \right\} \right] }[/math]

[math]\displaystyle{ \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1, \frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) - \tilde{E}(N) + \mu}{k_BT} \right\} \right] }[/math]

[math]\displaystyle{ \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ - \frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right] }[/math]

Usage

Installing TurboGAP

We can install the TurboGAP code, found here https://github.com/mcaroba/turbogap through recursively cloning the main branch

git clone --recursive https://github.com/mcaroba/turbogap.git

Then we edit this line in the Makefile to match our system architecture.

include makefiles/Makefile.Ubuntu_gfortran_mpi

Then, we make

make

and then export the path in .bashrc file

export PATH="~/turbogap/bin:$PATH"

What you need for a calculation

gap_files/ directory: The directory which contains your *.gap, with alphas (fitting coefficients) and sparse set descriptor files.

Potentials are available from TurboGAP wiki https://turbogap.fi/wiki/index.php/Potentials Conversion script from gap_fit .xml to .gap can be used turbogap/tools/make_gap_files.py

input file: The file which tells the code what to do
*.xyz: An extended xyz format file which contains positions/lattice/velocity information.

Adding a local property model for CEBE prediction

  gap_beg soap_turbo
  n_species = 2
  species = C O
  central_species = 1
... some params ...
  zeta = 4
  delta = 0.1
  desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
  alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
  compress_mode = "trivial"
  has_local_properties = .true.
  n_local_properties = 1
  local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
  local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
  local_property_labels = 'core_electron_be'
  local_property_zetas = 2
  local_property_deltas = 1.0
  local_property_v0s = 290.0
  gap_end

Conversion of `gap_fit` `.xml` files to `.gap` format

Then we convert the *.xml files which result from gap_fit into the required format for TurboGAP.
This is done using the convenience script in turbogap/tools/make_gap_files.py

python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
              {local_prop1.xml} {local_prop_label1} \
              {local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
        core_electron_be_co.xml core_electron_be

This creates a directory gap_files/ which has
1. alphas_{descriptor}.dat which are fitting coefficients for a particular descriptor
2. qs_{descriptor}.dat which are the sparse set descriptors, used for prediction.
3. alphas_{local_property}.dat and qs_{local_property}.dat for local property prediction.

Running the Tutorial

Navigate to turbogap/tutorials/xps_optimizaton
Run bash run.sh {calc_type} where {calc_type} is either:
- prediction
- standard_gcmc
- xps_optimization_gcmc
Note that these simulations have a high chemical potential and (have high [math]\displaystyle{ \gamma }[/math] factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.

Core electron binding energy prediction

[mpirun -np N] turbogap predict

We have in our input file

! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.

Output:

trajectory_out.xyz: which contains atomic positions, local energy and local property arrays for each structure.

 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
   Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429  energy_2b=-124.73820332 energy_3b=0.55261463 \
   energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
   virial="..." stress="..." volume=5372.640346
   C              4.39625950       7.79414272      11.91703476     \
                  -0.00176074      -0.00403026       0.00040104    \
                  -8.32055957 \ # Local energy prediction
                  283.83841418  # CEBE Prediction

Withiout specifying Exp. data, Turbogap does not predict the spectra.
XPS spectra can simply be calculated from core_electron_be in trajectory_out.xyz.

(Grand-Canonical) Monte-Carlo

We use mc mode

[mpirun -np N] turbogap mc

e.g For a [math]\displaystyle{ \mu V T }[/math] simulation of system in oxygen environment

! Monte-carlo options
mc_nsteps = 1000                         # Number of Monte-Carlo Steps
n_mc_types = 3                           # Number of MC move types
mc_types = 'move' 'insertion' 'removal'  # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5                        # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1              # Number of chemical potentials to add
mc_mu = 0                # Chemical potential(s) [eV]
mc_species = 'O'         # GCMC species types
mc_min_dist = 0.1        # GCMC minimum insertion distance

Output:

mc.log

# mc_istep  mc_move  accepted  E_trial              E_current             E_exp_trial          E_exp_current  N_tot_trial  N_mc_species_trial
       1       move    F       -4321.10376531       -4321.58260476         272.97693282         272.83692252      512  O        0
       2       move    F       -4321.10017027       -4321.58260476         272.68730901         272.83692252      512  O        0
       3  insertion    F       -4319.14082561       -4321.58260476         269.49191203         272.83692252      513  O        1

mc_all.xyz: xyz file written to every write_xyz = N steps.
mc_current.xyz: xyz file saving current accepted config written to every write_xyz = N steps.

If we run bash run.sh standard_gcmc (may take a few minutes), we can see a prediction of the XPS with oxygen content

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.

Experimental Observable Optimization

[mpirun -np N] turbogap mc/md

We can specify experimental observables to predict and optimize them using MC or MD. [math]\displaystyle{ E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 }[/math]

! Experimental data options
n_exp = 1                                  # Number of experimental observables
exp_labels = 'xps'                         # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat'  # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501                        # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0                    # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4

Output:

xps_prediction.dat: Data file written every write_xyz = N steps. Separate predictions separated by blank lines.
xps_exp.dat: The experimental data fed into TurboGAP.

If we run bash run.sh xps_optimization_gcmc (may take a few minutes), we can see a prediction of the XPS with oxygen content

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.

Multiple Observables

This is not in the tutorial files, but TurboGAP allows for we can optimization multiple observables.
- X-ray diffraction
- Neutron diffraction
- Pair distribution function
- XPS
Here we show a fictitious example of XPS and XRD optimization.

# Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2                  # Number of Exp. observables
exp_labels = 'xps' 'xrd'   # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0  # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy

Applications

Experimental Observable prediction/optimization

Deconvolution analysis

Use subgraph isomorphisms of bonding networks to discern motifs present.

↑ Experimental XPS deconvolution (Santini et al. 2015)

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

↑ Our XPS deconvolution (Santini et al. 2015)

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.

Summary

Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ab-inito and experimental data.
We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
We can use multiple experimental observables.
Finding structures which match experimental data allows us to understand specific experimental results.
Deconvolve experimental XPS exactly.
All of this is implemented in the TurboGAP code.

References

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” Physical Review B 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” Physical Review Letters 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” Physical Review B 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” Chemical Reviews 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” Chemistry of Materials 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \[math]\displaystyle{ \mathrm\\\C\\\\\_\60\\ }[/math].” Physical Review B 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” Journal of Computational Chemistry 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” Nature Communications 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” Chemistry of Materials 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” Journal of the American Chemical Society 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

Contents

What is TurboGAP ?

Motivation for structural inference

Theory and Methods

Gaussian Approximation Potentials (GAP)

Using descriptors to predict quantities

SOAP (many-body) descriptor

How do we model XPS?

Prediction of Local Properties

How do we marry experimental data and GAP?

Usage

Installing TurboGAP

What you need for a calculation

Adding a local property model for CEBE prediction

Conversion of `gap_fit` `.xml` files to `.gap` format

Running the Tutorial

Core electron binding energy prediction

(Grand-Canonical) Monte-Carlo

Experimental Observable Optimization

Multiple Observables

Applications

Experimental Observable prediction/optimization

Deconvolution analysis

Summary

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

XPS-optimized oxygenated amorphous carbon

Contents

What is TurboGAP ?

Motivation for structural inference

Theory and Methods

Gaussian Approximation Potentials (GAP)

Using descriptors to predict quantities

SOAP (many-body) descriptor

How do we model XPS?

Prediction of Local Properties

How do we marry experimental data and GAP?

Usage

Installing TurboGAP

What you need for a calculation

Adding a local property model for CEBE prediction

Conversion of gap_fit *.xml files to *.gap format

Running the Tutorial

Core electron binding energy prediction

(Grand-Canonical) Monte-Carlo

Experimental Observable Optimization

Multiple Observables

Applications

Experimental Observable prediction/optimization

Deconvolution analysis

Summary

References

Navigation menu

Search

Conversion of `gap_fit` `.xml` files to `.gap` format