XPS-optimized oxygenated amorphous carbon
Contents
- 1 What is TurboGAP ?
- 2 Motivation for structural inference
- 3 Theory and Methods
- 4 Usage
- 5 Applications
- 6 Summary
- 7 References
What is TurboGAP ?
- TurboGAP is a
Fortran
atomistic code which uses Gaussian Approximation Potentials (GAPs). - Its specialty are SOAP turbo descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
- It's (probably!) the fastest SOAP prediction code, which will be made even faster with GPU acceleration (ongoing with CSC).
- Through GAP/SOAP predictions, one can perform simulations with ab-initio accuracy at scale
- Molecular Dynamics
- Grand-Canonical Monte-Carlo simulations
- Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
- Prediction/optimization of experimental observables (e.g. X-Ray photoelectron spectra (Golze et al. 2022; Zarrouk et al. 2024), X-ray/Neutron Diffraction and more to come!).
Motivation for structural inference
- We typically want to predict structures, to make sense of experimental data.
- We usually pray structures have experimental agreement (a “bottom-up” approach).
- What if we can promote agreement with experimental data?
- Reverse Monte-Carlo techniques match experimental data to atomic structure (“top-down” approach).
- Downsides: “By-hand” constraints
- Simple experimental observables.
- Unphysical structures/artifacts (Opletal et al. 2017)
Need efficient structure search to find agreement with experimental data and ab-inito data.
With CO GAP + Core electron binding energy SOAP model we will perform modified (Grand-canonical) Monte-Carlo to produce experimentally consistent, low energy structures of oxygenated amorphous carbon.
We will do simulations similar to the above, which is from our JACS paper (Zarrouk et al. 2024).
Theory and Methods
Gaussian Approximation Potentials (GAP)
(Bartók et al. 2010; Bartók, Kondor, and Csányi 2013; Deringer et al. 2021)
- Assumption: Total energy is sum of local energies. [math]\displaystyle{ E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i)) }[/math]
[math]\displaystyle{ \begin{align} \varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) = E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\ & + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i) + \sum_{r \lt r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber \end{align} }[/math]
- Local energy
- Sum of 2b, 3b and many-body terms.
- Descriptor [math]\displaystyle{ \xi / \boldsymbol{\xi} }[/math].
- e.g [math]\displaystyle{ \{\xi^{\text{2b}}\} = \{r_{ij}\} }[/math], [math]\displaystyle{ \{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \} }[/math]
- Local environment
- A set of descriptors.
Using descriptors to predict quantities
- Kernel/basis functions [math]\displaystyle{ k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j) }[/math] measure the similarity between descriptors.
- e.g Exponential 2b kernel: [math]\displaystyle{ k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 ) }[/math]
- Gaussian Process Regression (kernel matrix algebra on training set) → fitting coefficients [math]\displaystyle{ \{\alpha\} }[/math], (Deringer et al. 2021)
- e.g. 2b model trained with [math]\displaystyle{ N_s }[/math] descriptors [math]\displaystyle{ \{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\} }[/math], the energy for atom with [math]\displaystyle{ M }[/math] pairwise distances in local environment: [math]\displaystyle{ E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) }[/math]
SOAP (many-body) descriptor
- SOAP descriptor: [math]\displaystyle{ \boldsymbol{\xi}_i(\{\mathbf{r}\}_{ \lt r_{\rm cut}} ) }[/math], represents environment of an atom (Bartók et al. 2010).
- Kernel: [math]\displaystyle{ k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta} }[/math] measures similarity of environments.
- Fitting coefficients: [math]\displaystyle{ \alpha_j }[/math] result from Gaussian Process Regression.
- Prediction: [math]\displaystyle{ E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij} }[/math]
- For specifics of SOAP turbo, refer to Miguel's paper (Caro 2019)
How do we model XPS?
- XPS (X-ray photoelectron spectroscopy) spectra measure core electron binding energies.
- Core electron binding energies depend on atomic environment.
- SOAP turbo model trained on DFT ([math]\displaystyle{ \Delta\text{KS} }[/math]) data with a [math]\displaystyle{ GW }[/math] correction on top (Golze et al. 2022).
- [math]\displaystyle{ \text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} ) }[/math]
- Thermal and instrumental broadening accounted for by a Gaussian with [math]\displaystyle{ \sigma = 0.4\text{ eV} }[/math]
- Can analyse environments which make up spectra.
Prediction of Local Properties
- We can predict an arbitrary number of local properties which use SOAP turbo descriptors using TurboGAP.
- /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
- Hirshfeld volumes
- gives vdW dispersion via Tkatchenko-Scheffler. [math]\displaystyle{ E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} }[/math]
- Core-electron binding energies
- give X-ray Photoelectron Spectroscopy prediction. [math]\displaystyle{ g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) }[/math]
How do we marry experimental data and GAP?
- Generalized Hamiltonian within Grand-Canonical Monte-Carlo scheme.
- Vary oxygen content with chemical potential, to see motif differences.
[math]\displaystyle{ \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left( -\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm {pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right) }[/math]
[math]\displaystyle{ \tilde{E} = E_{\rm GAP} + E_{\rm spectra} }[/math]
[math]\displaystyle{ E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2 }[/math]
A simple squared difference, which is a penalty with increases with spectra dissimilarity and [math]\displaystyle{ \gamma }[/math] is an energy scale.
We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.
- We use the standard acceptance criteria for adding/removing/moving a particle
[math]\displaystyle{ \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1, \frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) - \tilde{E}(N) - \mu}{k_BT} \right\} \right] }[/math]
[math]\displaystyle{ \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1, \frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) - \tilde{E}(N) + \mu}{k_BT} \right\} \right] }[/math]
[math]\displaystyle{ \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ - \frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right] }[/math]
Usage
Installing TurboGAP
We can install the TurboGAP code, found here https://github.com/mcaroba/turbogap through recursively cloning the main branch
git clone --recursive https://github.com/mcaroba/turbogap.git
Then we edit this line in the Makefile
to match our system architecture.
include makefiles/Makefile.Ubuntu_gfortran_mpi
Then, we make
make
and then export the path in .bashrc
file
export PATH="~/turbogap/bin:$PATH"
What you need for a calculation
gap_files/
directory- The directory which contains your
*.gap
, with alphas (fitting coefficients) and sparse set descriptor files.
- Potentials are available from TurboGAP wiki https://turbogap.fi/wiki/index.php/Potentials
- Conversion script from
gap_fit .xml
to.gap
can be usedturbogap/tools/make_gap_files.py
input
file- The file which tells the code what to do
*.xyz
- An extended xyz format file which contains positions/lattice/velocity information.
Adding a local property model for CEBE prediction
gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
Conversion of gap_fit
*.xml
files to *.gap
format
- Then we convert the
*.xml
files which result fromgap_fit
into the required format for TurboGAP. - This is done using the convenience script in
turbogap/tools/make_gap_files.py
python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
- This creates a directory
gap_files/
which hasalphas_{descriptor}.dat
which are fitting coefficients for a particular descriptorqs_{descriptor}.dat
which are the sparse set descriptors, used for prediction.alphas_{local_property}.dat
andqs_{local_property}.dat
for local property prediction.
Running the Tutorial
- Navigate to
turbogap/tutorials/xps_optimizaton
- Run
bash run.sh {calc_type}
where{calc_type}
is either:prediction
standard_gcmc
xps_optimization_gcmc
- Note that these simulations have a high chemical potential and (have high [math]\displaystyle{ \gamma }[/math] factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.
Core electron binding energy prediction
[mpirun -np N] turbogap predict
- We have in our
input
file
! Species-specific info atoms_file = 'atoms.xyz' pot_file = 'gap_files/CO.gap' n_species = 2 species = C O masses = 12.01 15.99 e0 = -.16138053 0.
Output:
trajectory_out.xyz
- which contains atomic positions, local energy and local property arrays for each structure.
512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
- Withiout specifying Exp. data, Turbogap does not predict the spectra.
- XPS spectra can simply be calculated from
core_electron_be
intrajectory_out.xyz
.
(Grand-Canonical) Monte-Carlo
- We use
mc
mode
[mpirun -np N] turbogap mc
- e.g For a [math]\displaystyle{ \mu V T }[/math] simulation of system in oxygen environment
! Monte-carlo options mc_nsteps = 1000 # Number of Monte-Carlo Steps n_mc_types = 3 # Number of MC move types mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md) mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves ! - GCMC Options n_mc_mu = 1 # Number of chemical potentials to add mc_mu = 0 # Chemical potential(s) [eV] mc_species = 'O' # GCMC species types mc_min_dist = 0.1 # GCMC minimum insertion distance
Output:
mc.log
# mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
mc_all.xyz
- xyz file written to every
write_xyz = N
steps. mc_current.xyz
- xyz file saving current accepted config written to every
write_xyz = N
steps.
- If we run
bash run.sh standard_gcmc
(may take a few minutes), we can see a prediction of the XPS with oxygen content
We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.
Experimental Observable Optimization
[mpirun -np N] turbogap mc/md
- We can specify experimental observables to predict and optimize them using MC or MD. [math]\displaystyle{ E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 }[/math]
! Experimental data options n_exp = 1 # Number of experimental observables exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf) exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.) exp_n_samples = 501 # Number of interpolations samples over the experimental range. exp_energy_scales = 100.0 # Energy scale (gamma) [eV] # XPS smearing xps_sigma = 0.4
Output:
xps_prediction.dat
- Data file written every
write_xyz = N
steps. Separate predictions separated by blank lines. xps_exp.dat
- The experimental data fed into TurboGAP.
- If we run
bash run.sh xps_optimization_gcmc
(may take a few minutes), we can see a prediction of the XPS with oxygen content
We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.
Multiple Observables
- This is not in the tutorial files, but TurboGAP allows for we can optimization multiple observables.
- X-ray diffraction
- Neutron diffraction
- Pair distribution function
- XPS
- Here we show a fictitious example of XPS and XRD optimization.
# Partial PDFs needed for XRD calculation pair_distribution_n_samples = 301 r_range_min = 0.1 r_range_max = 13.5 pair_distribution_rcut = 14.1 do_xrd = .true. # Experimental Data Specification n_exp = 2 # Number of Exp. observables exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf) exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat' exp_n_samples = 301 301 # Number of samples for the experimental data exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable # If using monte-carlo, the experimental energies are added directly to the local energy
Applications
Experimental Observable prediction/optimization
Deconvolution analysis
- Use subgraph isomorphisms of bonding networks to discern motifs present.
↑ Experimental XPS deconvolution (Santini et al. 2015)
We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.
↑ Our XPS deconvolution (Santini et al. 2015)
However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.
We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.
We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.
Summary
- Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ab-inito and experimental data.
- We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
- We can use multiple experimental observables.
- Finding structures which match experimental data allows us to understand specific experimental results.
- Deconvolve experimental XPS exactly.
- All of this is implemented in the TurboGAP code.
References
Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” Physical Review B 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.
Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” Physical Review Letters 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.
Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” Physical Review B 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.
Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” Chemical Reviews 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.
El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.
Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” Chemistry of Materials 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.
Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \[math]\displaystyle{ \mathrm\\\C\\\\\_\60\\ }[/math].” Physical Review B 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.
Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” Journal of Computational Chemistry 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.
Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” Nature Communications 6 (1): 8600. https://doi.org/10.1038/ncomms9600.
Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” Chemistry of Materials 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.
Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” Journal of the American Chemical Society 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.