TurboGAP - User contributions [en]

Structural Inference Options

2026-03-22T14:59:28Z

Tigany Zarrouk:

Structural inference of experimental data (of the types that can be simulated by [[Experimental Observable Options]] can be performed with TurboGAP via Reverse Monte-Carlo (RMC) or Molecular Augmented Dynamics (MAD).

<pre>
n_exp = 1 # Number of experimental observables we wish
# the structure to replicate
exp_labels = 'xps' # Labels of experimental observables
# (currently limited to xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_exp.dat' # Experimental data
exp_n_samples = 501 # Number of samples for linear interpolation
# of experimental data (needed if data is not
# on a uniform grid), this number should be
# greater than the number of data points in
# the exp. data file.
exp_energy_scales = 10.0 # The energy scale (gamma) for exp. potential.
exp_energy_scales_beg = 10.0 # The intital energy scale
exp_energy_scales_end = 100.0 # The final energy scale

</pre>

Structural Inference Options

2026-03-22T14:58:44Z

Tigany Zarrouk: Created page with "Structural inference of experimental data (of the types that can be simulated by Experimental Observable Options can be performed with TurboGAP via Reverse Monte-Carlo (RM..."

Structural inference of experimental data (of the types that can be simulated by [[Experimental Observable Options]] can be performed with TurboGAP via Reverse Monte-Carlo (RMC) or Molecular Augmented Dynamics (MAD).

<pre>
n_exp = 1 # Number of experimental observables we wish
# the structure to replicate
exp_labels = 'xps' # Labels of experimental observables
# (currently limited to xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_exp.dat' # Experimental data
exp_n_samples = 501 # Number of samples for linear interpolation
# of experimental data (needed if data is not on
# a uniform grid), this number should be greater
# than the number of data points in the exp data file.
exp_energy_scales = 10.0 # The energy scale (gamma) for exp. potential.
exp_energy_scales_beg = 10.0 # The intital energy scale
exp_energy_scales_end = 100.0 # The final energy scale

</pre>

Experimental Observable Options

2026-03-22T14:53:56Z

Tigany Zarrouk:

Options for predicting experimental observables are found below.

Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron diffraction and x-ray photoelectron spectra.

<pre>
do_xrd = .true. # Do X-Ray diffraction prediction
q_range_min = 1.0 # -> Range for the XRD/structure factor
# calculation: q = 4 pi sin( theta )
# / lambda, where theta is the half
# angle of diffraction
q_range_max = 10.0 # -> Range - " -
write_xrd = .true. # -> Write out xrd pattern
xrd_output = 'q*F(q)' # -> Output the XRD pattern as the direct
# Fourier transform of G(r), the reduced
# PDF (this can be 'F(q)'/'i(q)' or
# the full xrd intensity 'xrd')

do_pair_distribution = .true. # Calculate the XRD from the pair
# distribution function, so it scales
# linearly with the number of atoms
pair_distribution_kde_sigma = 0.1 # -> Use Gaussian Kernel Density Estimate
# of width 0.1A to smooth out,
# accounting for thermal broadening
pair_distribution_partial = .true. # -> Calculate partial pair-distribution functions
pair_distribution_rcut = 10.6 # -> Cutoff partial pair distribution
r_range_min = 0.1 # -> Range for the PDF calculation
r_range_max = 10.0 # -> Range - " -
write_pair_distribution = .true. # -> Write out pair distribution functions

do_structure_factor = .true. # Use (raw, non-scattering factor corrected)
# (partial) structure factor(s) for calculations
structure_factor_from_pdf = .true. # -> Fourier transform the pair distribution
# functions to obtain the uncorrected structure
# factors, which when corrected give the XRD pattern.
structure_factor_window = .true. # -> Use a multiplicative "windowing" function
# (sin(pi r / r_cut)/(pi r / r_cut)) in the fourier
# transform of pdf to minimize high frequency
# artifacts resulting from the finite range
# Fourier transform.
write_structure_factor = .true. # -> Write out structure factors

do_xps = .true. # Do x-ray photoelectron spectroscopy (XPS)
# prediction if a model has been specified
# in the .gap file
xps_e_min = 280. # -> Minimum of range for XPS prediction
xps_e_max = 300. # -> Maximum of range for XPS prediction
xps_n_samples = 301 # -> Number of samples for XPS prediction
</pre>

Experimental Observable Options

2026-03-22T14:53:14Z

Tigany Zarrouk:

Options for predicting experimental observables are found below.

Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron diffraction and x-ray photoelectron spectra.

<pre>
do_xrd = .true. # Do X-Ray diffraction prediction
q_range_min = 1.0 # -> Range for the XRD/structure factor
# calculation: q = 4 pi sin( theta )
# / lambda, where theta is the half
# angle of diffraction
q_range_max = 10.0 # -> Range - " -
write_xrd = .true. # -> Write out xrd pattern
xrd_output = 'q*F(q)' # -> Output the XRD pattern as the direct
# Fourier transform of G(r), the reduced
# PDF (this can be 'F(q)'/'i(q)' or
# the full xrd intensity 'xrd')

do_pair_distribution = .true. # Calculate the XRD from the pair
# distribution function, so it scales
# linearly with the number of atoms
pair_distribution_kde_sigma = 0.1 # -> Use Gaussian Kernel Density Estimate
# of width 0.1A to smooth out,
# accounting for thermal broadening
pair_distribution_partial = .true. # -> Calculate partial pair-distribution functions
pair_distribution_rcut = 10.6 # -> Cutoff partial pair distribution
r_range_min = 0.1 # -> Range for the PDF calculation
r_range_max = 10.0 # -> Range - " -
write_pair_distribution = .true. # -> Write out pair distribution functions

do_structure_factor = .true. # Use (raw, non-scattering factor corrected)
# (partial) structure factor(s) for calculations
structure_factor_from_pdf = .true. # -> Fourier transform the pair distribution
# functions to obtain the uncorrected structure
# factors, which when corrected give the XRD pattern.
structure_factor_window = .true. # -> Use a multiplicative "windowing" function
# (sin(pi r / r_cut)/(pi r / r_cut)) in the fourier
# transform of pdf to minimize high frequency
# artifacts resulting from the finite range
# Fourier transform.
write_structure_factor = .true. # -> Write out structure factors

do_xps = .true. # Do x-ray photoelectron spectroscopy (XPS) prediction
# if a model has been specified in the .gap file
xps_e_min = 280. # -> Minimum of range for XPS prediction
xps_e_max = 300. # -> Maximum of range for XPS prediction
xps_n_samples = 301 # -> Number of samples for XPS prediction
</pre>

Experimental Observable Options

2026-03-22T14:52:36Z

Tigany Zarrouk:

Options for predicting experimental observables are found below.

Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron diffraction and x-ray photoelectron spectra.

<pre>
do_xrd = .true. # Do X-Ray diffraction prediction
q_range_min = 1.0 # -> Range for the XRD/structure factor
# calculation: q = 4 pi sin( theta )
# / lambda, where theta is the half
# angle of diffraction
q_range_max = 10.0 # -> Range - " -
write_xrd = .true. # -> Write out xrd pattern
xrd_output = 'q*F(q)' # -> Output the XRD pattern as the direct
# Fourier transform of G(r), the reduced
# PDF (this can be 'F(q)'/'i(q)' or
# the full xrd intensity 'xrd')

do_pair_distribution = .true. # Calculate the XRD from the pair
# distribution function, so it scales
# linearly with the number of atoms
pair_distribution_kde_sigma = 0.1 # -> Use Gaussian Kernel Density Estimate
# of width 0.1A to smooth out,
# accounting for thermal broadening
pair_distribution_partial = .true. # -> Calculate partial pair-distribution functions
pair_distribution_rcut = 10.6 # -> Cutoff partial pair distribution
r_range_min = 0.1 # -> Range for the PDF calculation
r_range_max = 10.0 # -> Range - " -
write_pair_distribution = .true. # -> Write out pair distribution functions

do_structure_factor = .true. # Use (raw, non-scattering factor corrected)
# (partial) structure factor(s) for calculations
structure_factor_from_pdf = .true. # -> Fourier transform the pair distribution
# functions to obtain the uncorrected structure
# factors, which when corrected give the XRD pattern.
structure_factor_window = .true. # -> Use a multiplicative "windowing" function
# (sin(pi r / r_cut)/(pi r / r_cut)) in the fourier
# transform of pdf to minimize high frequency
# artifacts resulting from the finite range
# Fourier transform.
write_structure_factor = .true. # -> Write out structure factors

do_xps = .true. # Do x-ray photoelectron spectroscopy (XPS) prediction if
# a model has been specified in the .gap file
xps_e_min = 280. # Minimum of range for XPS prediction
xps_e_max = 300. # Maximum of range for XPS prediction
xps_n_samples = 301 # Number of samples for XPS prediction
</pre>

Experimental Observable Options

2026-03-22T14:52:07Z

Tigany Zarrouk:

Options for predicting experimental observables are found below.

Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron diffraction and x-ray photoelectron spectra.

<pre>
do_xrd = .true. # Do X-Ray diffraction prediction
q_range_min = 1.0 # -> Range for the XRD/structure factor
# calculation: q = 4 pi sin( theta )
# / lambda, where theta is the half
# angle of diffraction
q_range_max = 10.0 # -> Range - " -
write_xrd = .true. # -> Write out xrd pattern
xrd_output = 'q*F(q)' # -> Output the XRD pattern as the direct
# Fourier transform of G(r), the reduced
# PDF (this can be 'F(q)'/'i(q)' or
# the full xrd intensity 'xrd')

do_pair_distribution = .true. # Calculate the XRD from the pair
# distribution function, so it scales
# linearly with the number of atoms
pair_distribution_kde_sigma = 0.1 # -> Use Gaussian Kernel Density Estimate
# of width 0.1A to smooth out,
# accounting for thermal broadening
pair_distribution_partial = .true. # -> Calculate partial pair-distribution functions
pair_distribution_rcut = 10.6 # -> Cutoff partial pair distribution
r_range_min = 0.1 # -> Range for the PDF calculation
r_range_max = 10.0 # -> Range - " -
write_pair_distribution = .true. # -> Write out pair distribution functions

do_structure_factor = .true. # Use (raw, non-scattering factor corrected)
# (partial) structure factor(s) for calculations
structure_factor_from_pdf = .true. # -> Fourier transform the pair distribution
# functions to obtain the uncorrected structure
# factors, which when corrected give the XRD pattern.
structure_factor_window = .true. # -> Use a multiplicative "windowing" function
# (sin(pi r / r_cut)/(pi r / r_cut)) in the fourier
# transform of pdf to minimize high frequency
# artifacts resulting from the finite range Fourier
# transform.
write_structure_factor = .true. # -> Write out structure factors

do_xps = .true. # Do x-ray photoelectron spectroscopy (XPS) prediction if
# a model has been specified in the .gap file
xps_e_min = 280. # Minimum of range for XPS prediction
xps_e_max = 300. # Maximum of range for XPS prediction
xps_n_samples = 301 # Number of samples for XPS prediction
</pre>

Experimental Observable Options

2026-03-22T14:50:35Z

Tigany Zarrouk:

Options for predicting experimental observables are found below.

Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron diffraction and x-ray photoelectron spectra.

<pre>
do_xrd = .true. # Do X-Ray diffraction prediction
q_range_min = 1.0 # -> Range for the XRD/structure factor
# calculation: q = 4 pi sin( theta )
# / lambda, where theta is the half
# angle of diffraction
q_range_max = 10.0 # -> Range - " -
write_xrd = .true. # -> Write out xrd pattern
xrd_output = 'q*F(q)' # -> Output the XRD pattern as the direct
# Fourier transform of G(r), the reduced
# PDF (this can be 'F(q)'/'i(q)' or
# the full xrd intensity 'xrd')

do_pair_distribution = .true. # Calculate the XRD from the pair
# distribution function, so it scales
# linearly with the number of atoms
pair_distribution_kde_sigma = 0.1 # -> Use Gaussian Kernel Density Estimate
# of width 0.1A to smooth out,
# accounting for thermal broadening
pair_distribution_partial = .true. # -> Calculate partial pair-distribution functions
pair_distribution_rcut = 10.6 # -> Cutoff partial pair distribution
r_range_min = 0.1 # -> Range for the PDF calculation
r_range_max = 10.0 # -> Range - " -
write_pair_distribution = .true. # -> Write out pair distribution functions

do_structure_factor = .true. # Use (raw, non-scattering factor corrected)
# (partial) structure factor(s) for calculations
structure_factor_from_pdf = .true. # -> Fourier transform the pair distribution
# functions to obtain the uncorrected structure
# factors, which when corrected give the XRD pattern.
structure_factor_window = .true. # -> Use a multiplicative "windowing" function
# (sin(pi r / r_cut)/(pi r / r_cut)) in the fourier
# transform of pdf to minimize high frequency
# artifacts resulting from the finite range Fourier
# transform.
write_structure_factor = .true. # -> Write out structure factors

do_xps = .true.
xps_e_min = 280.
xps_e_max = 300.
xps_n_samples = 301
</pre>

Experimental Observable Options

2026-03-22T14:39:59Z

Tigany Zarrouk: Created page with "Options for predicting experimental observables are found below. Currently implemented observables are pair distribution functions, powder x-ray diffraction, powder neutron..."

Documentation

2026-03-22T14:36:26Z

Tigany Zarrouk: /* Special Features */

'''TurboGAP''' is a program and associated collection of routines designed for carrying out atomistic calculations based on machine learning interatomic potentials. This page deals with the technical aspects of using '''TurboGAP'''; to learn more about the underlying theory, check the [[GAP theory]] page. Since it is often easier to learn by example, make sure to take a look at the [[tutorials]] to familiarize yourself with '''TurboGAP'''.

== Calculation mode ==

There are three basic modes for running a '''TurboGAP''' calculation, <code>turbogap predict</code>, <code>turbogap md</code> and <code>turbogap mc</code>. They are invoked by simply typing <code>turbogap predict</code>, <code>turbogap md</code> or <code>turbogap mc</code> in the command line or a bash script (e.g., to run MD in parallel on 8 CPU cores: <code>mpirun -np 8 turbogap md</code>). All execution modes require an <code>input</code> file with '''TurboGAP''' options, a <code>gap_files</code> directory with the GAP potential to be used in the calculation, and an XYZ file in ASE's extended XYZ format with atomic positions, lattice vectors and chemical species information (for MD, also atomic velocities are needed).

=== turbogap predict ===

<code>turbogap predict</code> performs single-point calculation (i.e., the atomic positions are ''not'' updated during the simulation) for [[total energy]], [[local energy]], [[forces]] and [[virial pressure]]. When available for the specific potential, it can also perform a [[Hirshfeld volume]] prediction. If the atoms file contains more than one configuration, in the form of concatenated individual atomic structures, '''TurboGAP''' will perform predictions for all of them.

=== turbogap md ===

<code>turbogap md</code> performs [[molecular dynamics]] (default) or [[energy minimization]] according to the [[MD options|options]] specified in the <code>input</code> file. Currently only Velocity-Verlet MD and gradient descent energy minimization are supported. We expect to add support for Monte Carlo and other simulation protocols in the near future. To choose between different methods to propagate the atomic positions, take a look at the <code>[[optimize]]</code> keyword. If there are more than one atomic structures in the XYZ file, <code>turbogap md</code> will use the first one as starting point. Note how this differs from the behavior of <code>turbogap predict</code>, where single-point calculations are performed for ''all'' the structures in the XYZ file.

=== turbogap mc ===
<code>turbogap mc</code> performs (Grand-Canonical) [[Monte-Carlo]] simulations. These can be (NVT), (NPT), (mu VT) or (mu PT). Hybrid MC (using molecular dynamics to produce a trial move) can be performed as well as relaxation after specific trial moves. The user can specify a large number of move types. For reference on the specification see [[Monte-Carlo]]. The outputs are <code>mc.log</code> (the log file), <code>mc_all.xyz</code> all of the accepted MC steps, <code>mc_trial.xyz</code> which is an .xyz containing a trial move and <code>mc_current.xyz</code> which is the current accepted step. The .xyz files are written every <code>write_xyz=N</code> steps.

== Files ==

=== Input file (input) ===

The <code>input</code> file contains the keywords that tell '''TurboGAP''' how to perform the single-point or MD calculation requested by the user. A minimal <code>input</code> file (without MD options) contains only information about the structure XYZ file, the location of the potential, and chemical species. An example looks like this:

! Species-specific info
[[atoms_file]] = 'atoms.xyz'
[[pot_file]] = 'gap_files/cho.gap'
[[n_species]] = 3
[[species]] = H C O
[[masses]] = 1.01 12.01 16.00 ! this is optional for single point, for MD TurboGAP will try to get them from a database if not provided
[[e0]] = 0. 0. 0. ! this is optional, to specify per-species energy offsets

For a single-point <code>turbogap predict</code> calculation, something like the above is all that is needed. For running MD and other specialized simulations one needs to additionally specify the appropriate keywords. Check [[MD options]] for a complete list.

=== Atoms file (*.xyz) ===

The atoms file is an atomic structure file in ASE's [https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz extended XYZ] format. '''TurboGAP''' (currently) works exclusively in periodic boundary conditions; this must be taken into consideration when simulating molecular systems or surfaces (i.e., that an appropriate amount of vacuum is present). The format of the XYZ file must conform to the following:

Number_of_atoms
Comment line including Lattice="ax ay az bx by bz cx cy cz" and Properties=species:S:1:pos:R:3[:vel:R:3]
Atom_name_1 posx posy posz (velx vely velz)
Atom_name_2 posx posy posz (velx vely velz)
...
Atom_name_nat posx posy posz (velx vely velz)

where the velocity information is needed for MD ('''TurboGAP''' will generate random velocities if not provided). The positions must be in units of Angstrom, the velocities in Angstrom/fs and the masses in amu. '''TurboGAP''' XYZ reading adheres strictly to extXYZ format, with "species" (S:1), "pos" (R:3), "vel" (R:3), "fix_atom" (S:3, with values F or T allowed) and "mass" (R:1) read from the Properties attribute. "positions", "velocities", "fix_atoms" and "masses" are used as synonyms for "pos", "vel", "fix_atom" and "mass", respectively.

=== Potential directory (gap_files/) ===

The [[Potentials|GAP potential files]] are usually put into a subdirectory under your working directory named <code>gap_files</code>. This subdirectory contains a bunch of files generated with QUIP's <code>gap_fit</code> program, with XML extension, as well as a mumber of other files. The XML files are often enough to run a [[Running a GAP calculation with QUIP|GAP calculation with QUIP]], with the notable exception of potentials with [[vdW corrections]], which might need some preprocessing before they can be used with QUIP. The <code>*.gap</code> file tells '''TurboGAP''' how to use the different files to run a GAP calculation. When using these files with '''TurboGAP''' you do not need to worry about preprocessing, they're ready to go.

=== Output files ===

Besides standard output (basic messages, progress bar, etc.) that you get printed to stdout, '''TurboGAP''' produces one or two output files, depending on whether you are running a static calculation (<code>turbogap predict</code>) or molecular dynamics (<code>turbogap md</code>). Output file <code>[[trajectory_out.xyz]]</code> is always written out, and it contains atomic positions, predicted energy and forces, etc. Output file <code>[[thermo.log]]</code> is only written when doing MD, and it contains basic thermodynamic information (energy, temperature, pressure, etc.). One can control the frequency with which each file is written (for MD only) with <code>[[write_xyz]]</code> and <code>[[write_thermo]]</code>, respectively. For finer control over which properties are written and which are not, refer to [[writeouts]]. <code>turbogap mc</code> mode writes <code>mc_all.xyz</code>, which contains all accepted monte-carlo moves, <code>mc_current.xyz</code> which contains the last accepted monte-carlo move, for ease of restarting, and <code>mc.log</code> which contains information of the monte-carlo trials done during the simulation.

== Special Features ==

TurboGAP has the capability for predicting arbitrary number of local properties.

Local hirshfeld volumes can be predicted given an appropriate model, which can be used for including van der Waals interactions via a Tkatchenko-Scheffler formalism, or many-body dispersion.

TurboGAP can also perform experimental observable predictions (pair-distribution functions, x-ray diffraction, neutron diffraction and x-ray photoelectron spectra - the latter via Gaussian Process Prediction of local core-electron binding energies). The keyword options can be found in [[Experimental Observable Options]].

Structural inference of experimental data of all the above types can be invoked by Reverse Monte-Carlo (RMC) or Molecular Augmented Dynamics (MAD). This performs a multi-objective optimization of the interatomic potential energy and experimental agreement. Relevant options are found in [[Experimental Observable Options]] and [[Structural Inference Options]].

== Parallel support ==

Parallel support in '''TurboGAP''' is provided specifically via MPI. To build the '''TurboGAP''' code with MPI support you need an MPI-enabled Fortran compiler. '''TurboGAP''' is routinely tested and works reliably with the <code>gfortran</code> MPI wrapper, usually called <code>mpif90</code>. It should also be possible to build '''TurboGAP''' with Intel's <code>mpifort</code>, but we do not usually test the code with it. Note that the BLAS/LAPACK libraries used by '''TurboGAP''' should be compiled with the same compiler suite used to build '''TurboGAP''', to ensure compatibility. Also note that OpenMP support can be available from BLAS/LAPACK. In that case, hybrid MPI/OpenMP '''TurboGAP''' execution can be achieved, although be mindful that OpenMP acceleration can only be exploited for energy and force evaluation, not descriptor construction. We recommend to run '''TurboGAP''' with exclusive MPI parallelization, and our tests showed MPI performance to be superior to hybrid MPI/OpenMP. Since system architecture and the details of the potential might affect said performance, run your own tests to evaluate whether you gain speed up from BLAS/LAPACK's threading support.

Documentation

2026-03-22T14:35:52Z

Tigany Zarrouk: /* Special Features */

'''TurboGAP''' is a program and associated collection of routines designed for carrying out atomistic calculations based on machine learning interatomic potentials. This page deals with the technical aspects of using '''TurboGAP'''; to learn more about the underlying theory, check the [[GAP theory]] page. Since it is often easier to learn by example, make sure to take a look at the [[tutorials]] to familiarize yourself with '''TurboGAP'''.

== Calculation mode ==

There are three basic modes for running a '''TurboGAP''' calculation, <code>turbogap predict</code>, <code>turbogap md</code> and <code>turbogap mc</code>. They are invoked by simply typing <code>turbogap predict</code>, <code>turbogap md</code> or <code>turbogap mc</code> in the command line or a bash script (e.g., to run MD in parallel on 8 CPU cores: <code>mpirun -np 8 turbogap md</code>). All execution modes require an <code>input</code> file with '''TurboGAP''' options, a <code>gap_files</code> directory with the GAP potential to be used in the calculation, and an XYZ file in ASE's extended XYZ format with atomic positions, lattice vectors and chemical species information (for MD, also atomic velocities are needed).

=== turbogap predict ===

<code>turbogap predict</code> performs single-point calculation (i.e., the atomic positions are ''not'' updated during the simulation) for [[total energy]], [[local energy]], [[forces]] and [[virial pressure]]. When available for the specific potential, it can also perform a [[Hirshfeld volume]] prediction. If the atoms file contains more than one configuration, in the form of concatenated individual atomic structures, '''TurboGAP''' will perform predictions for all of them.

=== turbogap md ===

<code>turbogap md</code> performs [[molecular dynamics]] (default) or [[energy minimization]] according to the [[MD options|options]] specified in the <code>input</code> file. Currently only Velocity-Verlet MD and gradient descent energy minimization are supported. We expect to add support for Monte Carlo and other simulation protocols in the near future. To choose between different methods to propagate the atomic positions, take a look at the <code>[[optimize]]</code> keyword. If there are more than one atomic structures in the XYZ file, <code>turbogap md</code> will use the first one as starting point. Note how this differs from the behavior of <code>turbogap predict</code>, where single-point calculations are performed for ''all'' the structures in the XYZ file.

=== turbogap mc ===
<code>turbogap mc</code> performs (Grand-Canonical) [[Monte-Carlo]] simulations. These can be (NVT), (NPT), (mu VT) or (mu PT). Hybrid MC (using molecular dynamics to produce a trial move) can be performed as well as relaxation after specific trial moves. The user can specify a large number of move types. For reference on the specification see [[Monte-Carlo]]. The outputs are <code>mc.log</code> (the log file), <code>mc_all.xyz</code> all of the accepted MC steps, <code>mc_trial.xyz</code> which is an .xyz containing a trial move and <code>mc_current.xyz</code> which is the current accepted step. The .xyz files are written every <code>write_xyz=N</code> steps.

== Files ==

=== Input file (input) ===

The <code>input</code> file contains the keywords that tell '''TurboGAP''' how to perform the single-point or MD calculation requested by the user. A minimal <code>input</code> file (without MD options) contains only information about the structure XYZ file, the location of the potential, and chemical species. An example looks like this:

! Species-specific info
[[atoms_file]] = 'atoms.xyz'
[[pot_file]] = 'gap_files/cho.gap'
[[n_species]] = 3
[[species]] = H C O
[[masses]] = 1.01 12.01 16.00 ! this is optional for single point, for MD TurboGAP will try to get them from a database if not provided
[[e0]] = 0. 0. 0. ! this is optional, to specify per-species energy offsets

For a single-point <code>turbogap predict</code> calculation, something like the above is all that is needed. For running MD and other specialized simulations one needs to additionally specify the appropriate keywords. Check [[MD options]] for a complete list.

=== Atoms file (*.xyz) ===

The atoms file is an atomic structure file in ASE's [https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz extended XYZ] format. '''TurboGAP''' (currently) works exclusively in periodic boundary conditions; this must be taken into consideration when simulating molecular systems or surfaces (i.e., that an appropriate amount of vacuum is present). The format of the XYZ file must conform to the following:

Number_of_atoms
Comment line including Lattice="ax ay az bx by bz cx cy cz" and Properties=species:S:1:pos:R:3[:vel:R:3]
Atom_name_1 posx posy posz (velx vely velz)
Atom_name_2 posx posy posz (velx vely velz)
...
Atom_name_nat posx posy posz (velx vely velz)

where the velocity information is needed for MD ('''TurboGAP''' will generate random velocities if not provided). The positions must be in units of Angstrom, the velocities in Angstrom/fs and the masses in amu. '''TurboGAP''' XYZ reading adheres strictly to extXYZ format, with "species" (S:1), "pos" (R:3), "vel" (R:3), "fix_atom" (S:3, with values F or T allowed) and "mass" (R:1) read from the Properties attribute. "positions", "velocities", "fix_atoms" and "masses" are used as synonyms for "pos", "vel", "fix_atom" and "mass", respectively.

=== Potential directory (gap_files/) ===

The [[Potentials|GAP potential files]] are usually put into a subdirectory under your working directory named <code>gap_files</code>. This subdirectory contains a bunch of files generated with QUIP's <code>gap_fit</code> program, with XML extension, as well as a mumber of other files. The XML files are often enough to run a [[Running a GAP calculation with QUIP|GAP calculation with QUIP]], with the notable exception of potentials with [[vdW corrections]], which might need some preprocessing before they can be used with QUIP. The <code>*.gap</code> file tells '''TurboGAP''' how to use the different files to run a GAP calculation. When using these files with '''TurboGAP''' you do not need to worry about preprocessing, they're ready to go.

=== Output files ===

Besides standard output (basic messages, progress bar, etc.) that you get printed to stdout, '''TurboGAP''' produces one or two output files, depending on whether you are running a static calculation (<code>turbogap predict</code>) or molecular dynamics (<code>turbogap md</code>). Output file <code>[[trajectory_out.xyz]]</code> is always written out, and it contains atomic positions, predicted energy and forces, etc. Output file <code>[[thermo.log]]</code> is only written when doing MD, and it contains basic thermodynamic information (energy, temperature, pressure, etc.). One can control the frequency with which each file is written (for MD only) with <code>[[write_xyz]]</code> and <code>[[write_thermo]]</code>, respectively. For finer control over which properties are written and which are not, refer to [[writeouts]]. <code>turbogap mc</code> mode writes <code>mc_all.xyz</code>, which contains all accepted monte-carlo moves, <code>mc_current.xyz</code> which contains the last accepted monte-carlo move, for ease of restarting, and <code>mc.log</code> which contains information of the monte-carlo trials done during the simulation.

== Special Features ==

TurboGAP has the capability for predicting arbitrary number of local properties.

Local hirshfeld volumes can be predicted given an appropriate model, which can be used for including van der Waals interactions via a Tkatchenko-Scheffler formalism, or many-body dispersion.

TurboGAP can also perform experimental observable predictions (pair-distribution functions, x-ray diffraction, neutron diffraction and x-ray photoelectron spectra - the latter via Gaussian Process Prediction of local core-electron binding energies).

Structural inference of experimental data of all the above types can be invoked by Reverse Monte-Carlo (RMC) or Molecular Augmented Dynamics (MAD). This performs a multi-objective optimization of the interatomic potential energy and experimental agreement. Relevant options are found in [[Experimental Observable Options]] and [[Structural Inference Options]].

== Parallel support ==

Parallel support in '''TurboGAP''' is provided specifically via MPI. To build the '''TurboGAP''' code with MPI support you need an MPI-enabled Fortran compiler. '''TurboGAP''' is routinely tested and works reliably with the <code>gfortran</code> MPI wrapper, usually called <code>mpif90</code>. It should also be possible to build '''TurboGAP''' with Intel's <code>mpifort</code>, but we do not usually test the code with it. Note that the BLAS/LAPACK libraries used by '''TurboGAP''' should be compiled with the same compiler suite used to build '''TurboGAP''', to ensure compatibility. Also note that OpenMP support can be available from BLAS/LAPACK. In that case, hybrid MPI/OpenMP '''TurboGAP''' execution can be achieved, although be mindful that OpenMP acceleration can only be exploited for energy and force evaluation, not descriptor construction. We recommend to run '''TurboGAP''' with exclusive MPI parallelization, and our tests showed MPI performance to be superior to hybrid MPI/OpenMP. Since system architecture and the details of the potential might affect said performance, run your own tests to evaluate whether you gain speed up from BLAS/LAPACK's threading support.

Documentation

2026-03-22T14:34:54Z

Tigany Zarrouk:

'''TurboGAP''' is a program and associated collection of routines designed for carrying out atomistic calculations based on machine learning interatomic potentials. This page deals with the technical aspects of using '''TurboGAP'''; to learn more about the underlying theory, check the [[GAP theory]] page. Since it is often easier to learn by example, make sure to take a look at the [[tutorials]] to familiarize yourself with '''TurboGAP'''.

== Calculation mode ==

There are three basic modes for running a '''TurboGAP''' calculation, <code>turbogap predict</code>, <code>turbogap md</code> and <code>turbogap mc</code>. They are invoked by simply typing <code>turbogap predict</code>, <code>turbogap md</code> or <code>turbogap mc</code> in the command line or a bash script (e.g., to run MD in parallel on 8 CPU cores: <code>mpirun -np 8 turbogap md</code>). All execution modes require an <code>input</code> file with '''TurboGAP''' options, a <code>gap_files</code> directory with the GAP potential to be used in the calculation, and an XYZ file in ASE's extended XYZ format with atomic positions, lattice vectors and chemical species information (for MD, also atomic velocities are needed).

=== turbogap predict ===

<code>turbogap predict</code> performs single-point calculation (i.e., the atomic positions are ''not'' updated during the simulation) for [[total energy]], [[local energy]], [[forces]] and [[virial pressure]]. When available for the specific potential, it can also perform a [[Hirshfeld volume]] prediction. If the atoms file contains more than one configuration, in the form of concatenated individual atomic structures, '''TurboGAP''' will perform predictions for all of them.

=== turbogap md ===

<code>turbogap md</code> performs [[molecular dynamics]] (default) or [[energy minimization]] according to the [[MD options|options]] specified in the <code>input</code> file. Currently only Velocity-Verlet MD and gradient descent energy minimization are supported. We expect to add support for Monte Carlo and other simulation protocols in the near future. To choose between different methods to propagate the atomic positions, take a look at the <code>[[optimize]]</code> keyword. If there are more than one atomic structures in the XYZ file, <code>turbogap md</code> will use the first one as starting point. Note how this differs from the behavior of <code>turbogap predict</code>, where single-point calculations are performed for ''all'' the structures in the XYZ file.

=== turbogap mc ===
<code>turbogap mc</code> performs (Grand-Canonical) [[Monte-Carlo]] simulations. These can be (NVT), (NPT), (mu VT) or (mu PT). Hybrid MC (using molecular dynamics to produce a trial move) can be performed as well as relaxation after specific trial moves. The user can specify a large number of move types. For reference on the specification see [[Monte-Carlo]]. The outputs are <code>mc.log</code> (the log file), <code>mc_all.xyz</code> all of the accepted MC steps, <code>mc_trial.xyz</code> which is an .xyz containing a trial move and <code>mc_current.xyz</code> which is the current accepted step. The .xyz files are written every <code>write_xyz=N</code> steps.

== Files ==

=== Input file (input) ===

The <code>input</code> file contains the keywords that tell '''TurboGAP''' how to perform the single-point or MD calculation requested by the user. A minimal <code>input</code> file (without MD options) contains only information about the structure XYZ file, the location of the potential, and chemical species. An example looks like this:

! Species-specific info
[[atoms_file]] = 'atoms.xyz'
[[pot_file]] = 'gap_files/cho.gap'
[[n_species]] = 3
[[species]] = H C O
[[masses]] = 1.01 12.01 16.00 ! this is optional for single point, for MD TurboGAP will try to get them from a database if not provided
[[e0]] = 0. 0. 0. ! this is optional, to specify per-species energy offsets

For a single-point <code>turbogap predict</code> calculation, something like the above is all that is needed. For running MD and other specialized simulations one needs to additionally specify the appropriate keywords. Check [[MD options]] for a complete list.

=== Atoms file (*.xyz) ===

The atoms file is an atomic structure file in ASE's [https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz extended XYZ] format. '''TurboGAP''' (currently) works exclusively in periodic boundary conditions; this must be taken into consideration when simulating molecular systems or surfaces (i.e., that an appropriate amount of vacuum is present). The format of the XYZ file must conform to the following:

Number_of_atoms
Comment line including Lattice="ax ay az bx by bz cx cy cz" and Properties=species:S:1:pos:R:3[:vel:R:3]
Atom_name_1 posx posy posz (velx vely velz)
Atom_name_2 posx posy posz (velx vely velz)
...
Atom_name_nat posx posy posz (velx vely velz)

where the velocity information is needed for MD ('''TurboGAP''' will generate random velocities if not provided). The positions must be in units of Angstrom, the velocities in Angstrom/fs and the masses in amu. '''TurboGAP''' XYZ reading adheres strictly to extXYZ format, with "species" (S:1), "pos" (R:3), "vel" (R:3), "fix_atom" (S:3, with values F or T allowed) and "mass" (R:1) read from the Properties attribute. "positions", "velocities", "fix_atoms" and "masses" are used as synonyms for "pos", "vel", "fix_atom" and "mass", respectively.

=== Potential directory (gap_files/) ===

The [[Potentials|GAP potential files]] are usually put into a subdirectory under your working directory named <code>gap_files</code>. This subdirectory contains a bunch of files generated with QUIP's <code>gap_fit</code> program, with XML extension, as well as a mumber of other files. The XML files are often enough to run a [[Running a GAP calculation with QUIP|GAP calculation with QUIP]], with the notable exception of potentials with [[vdW corrections]], which might need some preprocessing before they can be used with QUIP. The <code>*.gap</code> file tells '''TurboGAP''' how to use the different files to run a GAP calculation. When using these files with '''TurboGAP''' you do not need to worry about preprocessing, they're ready to go.

=== Output files ===

Besides standard output (basic messages, progress bar, etc.) that you get printed to stdout, '''TurboGAP''' produces one or two output files, depending on whether you are running a static calculation (<code>turbogap predict</code>) or molecular dynamics (<code>turbogap md</code>). Output file <code>[[trajectory_out.xyz]]</code> is always written out, and it contains atomic positions, predicted energy and forces, etc. Output file <code>[[thermo.log]]</code> is only written when doing MD, and it contains basic thermodynamic information (energy, temperature, pressure, etc.). One can control the frequency with which each file is written (for MD only) with <code>[[write_xyz]]</code> and <code>[[write_thermo]]</code>, respectively. For finer control over which properties are written and which are not, refer to [[writeouts]]. <code>turbogap mc</code> mode writes <code>mc_all.xyz</code>, which contains all accepted monte-carlo moves, <code>mc_current.xyz</code> which contains the last accepted monte-carlo move, for ease of restarting, and <code>mc.log</code> which contains information of the monte-carlo trials done during the simulation.

== Special Features ==

TurboGAP has the capability for predicting arbitrary number of local properties.

Local hirshfeld volumes can be predicted given an appropriate model, which can be used for including van der Waals interactions via a Tkatchenko-Scheffler formalism, or many-body dispersion.

TurboGAP can also perform experimental observable predictions (pair-distribution functions, x-ray diffraction, neutron diffraction and x-ray photoelectron spectra - the latter via Gaussian Process Prediction of local core-electron binding energies).

Structural inference of experimental data of all the above types can be invoked by Reverse Monte-Carlo (RMC) or Molecular Augmented Dynamics (MAD). This performs a multi-objective optimization of the interatomic potential energy and experimental agreement. Relevant options are found on [[Structural Inference Options]].

== Parallel support ==

Parallel support in '''TurboGAP''' is provided specifically via MPI. To build the '''TurboGAP''' code with MPI support you need an MPI-enabled Fortran compiler. '''TurboGAP''' is routinely tested and works reliably with the <code>gfortran</code> MPI wrapper, usually called <code>mpif90</code>. It should also be possible to build '''TurboGAP''' with Intel's <code>mpifort</code>, but we do not usually test the code with it. Note that the BLAS/LAPACK libraries used by '''TurboGAP''' should be compiled with the same compiler suite used to build '''TurboGAP''', to ensure compatibility. Also note that OpenMP support can be available from BLAS/LAPACK. In that case, hybrid MPI/OpenMP '''TurboGAP''' execution can be achieved, although be mindful that OpenMP acceleration can only be exploited for energy and force evaluation, not descriptor construction. We recommend to run '''TurboGAP''' with exclusive MPI parallelization, and our tests showed MPI performance to be superior to hybrid MPI/OpenMP. Since system architecture and the details of the potential might affect said performance, run your own tests to evaluate whether you gain speed up from BLAS/LAPACK's threading support.

Documentation

2026-03-22T14:21:31Z

Tigany Zarrouk: /* Output files */

'''TurboGAP''' is a program and associated collection of routines designed for carrying out atomistic calculations based on machine learning interatomic potentials. This page deals with the technical aspects of using '''TurboGAP'''; to learn more about the underlying theory, check the [[GAP theory]] page. Since it is often easier to learn by example, make sure to take a look at the [[tutorials]] to familiarize yourself with '''TurboGAP'''.

== Calculation mode ==

There are three basic modes for running a '''TurboGAP''' calculation, <code>turbogap predict</code>, <code>turbogap md</code> and <code>turbogap mc</code>. They are invoked by simply typing <code>turbogap predict</code>, <code>turbogap md</code> or <code>turbogap mc</code> in the command line or a bash script (e.g., to run MD in parallel on 8 CPU cores: <code>mpirun -np 8 turbogap md</code>). All execution modes require an <code>input</code> file with '''TurboGAP''' options, a <code>gap_files</code> directory with the GAP potential to be used in the calculation, and an XYZ file in ASE's extended XYZ format with atomic positions, lattice vectors and chemical species information (for MD, also atomic velocities are needed).

=== turbogap predict ===

<code>turbogap predict</code> performs single-point calculation (i.e., the atomic positions are ''not'' updated during the simulation) for [[total energy]], [[local energy]], [[forces]] and [[virial pressure]]. When available for the specific potential, it can also perform a [[Hirshfeld volume]] prediction. If the atoms file contains more than one configuration, in the form of concatenated individual atomic structures, '''TurboGAP''' will perform predictions for all of them.

=== turbogap md ===

<code>turbogap md</code> performs [[molecular dynamics]] (default) or [[energy minimization]] according to the [[MD options|options]] specified in the <code>input</code> file. Currently only Velocity-Verlet MD and gradient descent energy minimization are supported. We expect to add support for Monte Carlo and other simulation protocols in the near future. To choose between different methods to propagate the atomic positions, take a look at the <code>[[optimize]]</code> keyword. If there are more than one atomic structures in the XYZ file, <code>turbogap md</code> will use the first one as starting point. Note how this differs from the behavior of <code>turbogap predict</code>, where single-point calculations are performed for ''all'' the structures in the XYZ file.

=== turbogap mc ===
<code>turbogap mc</code> performs (Grand-Canonical) [[Monte-Carlo]] simulations. These can be (NVT), (NPT), (mu VT) or (mu PT). Hybrid MC (using molecular dynamics to produce a trial move) can be performed as well as relaxation after specific trial moves. The user can specify a large number of move types. For reference on the specification see [[Monte-Carlo]]. The outputs are <code>mc.log</code> (the log file), <code>mc_all.xyz</code> all of the accepted MC steps, <code>mc_trial.xyz</code> which is an .xyz containing a trial move and <code>mc_current.xyz</code> which is the current accepted step. The .xyz files are written every <code>write_xyz=N</code> steps.

== Files ==

=== Input file (input) ===

The <code>input</code> file contains the keywords that tell '''TurboGAP''' how to perform the single-point or MD calculation requested by the user. A minimal <code>input</code> file (without MD options) contains only information about the structure XYZ file, the location of the potential, and chemical species. An example looks like this:

! Species-specific info
[[atoms_file]] = 'atoms.xyz'
[[pot_file]] = 'gap_files/cho.gap'
[[n_species]] = 3
[[species]] = H C O
[[masses]] = 1.01 12.01 16.00 ! this is optional for single point, for MD TurboGAP will try to get them from a database if not provided
[[e0]] = 0. 0. 0. ! this is optional, to specify per-species energy offsets

For a single-point <code>turbogap predict</code> calculation, something like the above is all that is needed. For running MD and other specialized simulations one needs to additionally specify the appropriate keywords. Check [[MD options]] for a complete list.

=== Atoms file (*.xyz) ===

The atoms file is an atomic structure file in ASE's [https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz extended XYZ] format. '''TurboGAP''' (currently) works exclusively in periodic boundary conditions; this must be taken into consideration when simulating molecular systems or surfaces (i.e., that an appropriate amount of vacuum is present). The format of the XYZ file must conform to the following:

Number_of_atoms
Comment line including Lattice="ax ay az bx by bz cx cy cz" and Properties=species:S:1:pos:R:3[:vel:R:3]
Atom_name_1 posx posy posz (velx vely velz)
Atom_name_2 posx posy posz (velx vely velz)
...
Atom_name_nat posx posy posz (velx vely velz)

where the velocity information is needed for MD ('''TurboGAP''' will generate random velocities if not provided). The positions must be in units of Angstrom, the velocities in Angstrom/fs and the masses in amu. '''TurboGAP''' XYZ reading adheres strictly to extXYZ format, with "species" (S:1), "pos" (R:3), "vel" (R:3), "fix_atom" (S:3, with values F or T allowed) and "mass" (R:1) read from the Properties attribute. "positions", "velocities", "fix_atoms" and "masses" are used as synonyms for "pos", "vel", "fix_atom" and "mass", respectively.

=== Potential directory (gap_files/) ===

The [[Potentials|GAP potential files]] are usually put into a subdirectory under your working directory named <code>gap_files</code>. This subdirectory contains a bunch of files generated with QUIP's <code>gap_fit</code> program, with XML extension, as well as a mumber of other files. The XML files are often enough to run a [[Running a GAP calculation with QUIP|GAP calculation with QUIP]], with the notable exception of potentials with [[vdW corrections]], which might need some preprocessing before they can be used with QUIP. The <code>*.gap</code> file tells '''TurboGAP''' how to use the different files to run a GAP calculation. When using these files with '''TurboGAP''' you do not need to worry about preprocessing, they're ready to go.

=== Output files ===

Besides standard output (basic messages, progress bar, etc.) that you get printed to stdout, '''TurboGAP''' produces one or two output files, depending on whether you are running a static calculation (<code>turbogap predict</code>) or molecular dynamics (<code>turbogap md</code>). Output file <code>[[trajectory_out.xyz]]</code> is always written out, and it contains atomic positions, predicted energy and forces, etc. Output file <code>[[thermo.log]]</code> is only written when doing MD, and it contains basic thermodynamic information (energy, temperature, pressure, etc.). One can control the frequency with which each file is written (for MD only) with <code>[[write_xyz]]</code> and <code>[[write_thermo]]</code>, respectively. For finer control over which properties are written and which are not, refer to [[writeouts]]. <code>turbogap mc</code> mode writes <code>mc_all.xyz</code>, which contains all accepted monte-carlo moves, <code>mc_current.xyz</code> which contains the last accepted monte-carlo move, for ease of restarting, and <code>mc.log</code> which contains information of the monte-carlo trials done during the simulation.

== Parallel support ==

Parallel support in '''TurboGAP''' is provided specifically via MPI. To build the '''TurboGAP''' code with MPI support you need an MPI-enabled Fortran compiler. '''TurboGAP''' is routinely tested and works reliably with the <code>gfortran</code> MPI wrapper, usually called <code>mpif90</code>. It should also be possible to build '''TurboGAP''' with Intel's <code>mpifort</code>, but we do not usually test the code with it. Note that the BLAS/LAPACK libraries used by '''TurboGAP''' should be compiled with the same compiler suite used to build '''TurboGAP''', to ensure compatibility. Also note that OpenMP support can be available from BLAS/LAPACK. In that case, hybrid MPI/OpenMP '''TurboGAP''' execution can be achieved, although be mindful that OpenMP acceleration can only be exploited for energy and force evaluation, not descriptor construction. We recommend to run '''TurboGAP''' with exclusive MPI parallelization, and our tests showed MPI performance to be superior to hybrid MPI/OpenMP. Since system architecture and the details of the potential might affect said performance, run your own tests to evaluate whether you gain speed up from BLAS/LAPACK's threading support.

Documentation

2026-03-22T14:18:50Z

Tigany Zarrouk: /* Calculation mode */

'''TurboGAP''' is a program and associated collection of routines designed for carrying out atomistic calculations based on machine learning interatomic potentials. This page deals with the technical aspects of using '''TurboGAP'''; to learn more about the underlying theory, check the [[GAP theory]] page. Since it is often easier to learn by example, make sure to take a look at the [[tutorials]] to familiarize yourself with '''TurboGAP'''.

== Calculation mode ==

There are three basic modes for running a '''TurboGAP''' calculation, <code>turbogap predict</code>, <code>turbogap md</code> and <code>turbogap mc</code>. They are invoked by simply typing <code>turbogap predict</code>, <code>turbogap md</code> or <code>turbogap mc</code> in the command line or a bash script (e.g., to run MD in parallel on 8 CPU cores: <code>mpirun -np 8 turbogap md</code>). All execution modes require an <code>input</code> file with '''TurboGAP''' options, a <code>gap_files</code> directory with the GAP potential to be used in the calculation, and an XYZ file in ASE's extended XYZ format with atomic positions, lattice vectors and chemical species information (for MD, also atomic velocities are needed).

=== turbogap predict ===

<code>turbogap predict</code> performs single-point calculation (i.e., the atomic positions are ''not'' updated during the simulation) for [[total energy]], [[local energy]], [[forces]] and [[virial pressure]]. When available for the specific potential, it can also perform a [[Hirshfeld volume]] prediction. If the atoms file contains more than one configuration, in the form of concatenated individual atomic structures, '''TurboGAP''' will perform predictions for all of them.

=== turbogap md ===

<code>turbogap md</code> performs [[molecular dynamics]] (default) or [[energy minimization]] according to the [[MD options|options]] specified in the <code>input</code> file. Currently only Velocity-Verlet MD and gradient descent energy minimization are supported. We expect to add support for Monte Carlo and other simulation protocols in the near future. To choose between different methods to propagate the atomic positions, take a look at the <code>[[optimize]]</code> keyword. If there are more than one atomic structures in the XYZ file, <code>turbogap md</code> will use the first one as starting point. Note how this differs from the behavior of <code>turbogap predict</code>, where single-point calculations are performed for ''all'' the structures in the XYZ file.

=== turbogap mc ===
<code>turbogap mc</code> performs (Grand-Canonical) [[Monte-Carlo]] simulations. These can be (NVT), (NPT), (mu VT) or (mu PT). Hybrid MC (using molecular dynamics to produce a trial move) can be performed as well as relaxation after specific trial moves. The user can specify a large number of move types. For reference on the specification see [[Monte-Carlo]]. The outputs are <code>mc.log</code> (the log file), <code>mc_all.xyz</code> all of the accepted MC steps, <code>mc_trial.xyz</code> which is an .xyz containing a trial move and <code>mc_current.xyz</code> which is the current accepted step. The .xyz files are written every <code>write_xyz=N</code> steps.

== Files ==

=== Input file (input) ===

The <code>input</code> file contains the keywords that tell '''TurboGAP''' how to perform the single-point or MD calculation requested by the user. A minimal <code>input</code> file (without MD options) contains only information about the structure XYZ file, the location of the potential, and chemical species. An example looks like this:

! Species-specific info
[[atoms_file]] = 'atoms.xyz'
[[pot_file]] = 'gap_files/cho.gap'
[[n_species]] = 3
[[species]] = H C O
[[masses]] = 1.01 12.01 16.00 ! this is optional for single point, for MD TurboGAP will try to get them from a database if not provided
[[e0]] = 0. 0. 0. ! this is optional, to specify per-species energy offsets

For a single-point <code>turbogap predict</code> calculation, something like the above is all that is needed. For running MD and other specialized simulations one needs to additionally specify the appropriate keywords. Check [[MD options]] for a complete list.

=== Atoms file (*.xyz) ===

The atoms file is an atomic structure file in ASE's [https://wiki.fysik.dtu.dk/ase/ase/io/formatoptions.html#extxyz extended XYZ] format. '''TurboGAP''' (currently) works exclusively in periodic boundary conditions; this must be taken into consideration when simulating molecular systems or surfaces (i.e., that an appropriate amount of vacuum is present). The format of the XYZ file must conform to the following:

Number_of_atoms
Comment line including Lattice="ax ay az bx by bz cx cy cz" and Properties=species:S:1:pos:R:3[:vel:R:3]
Atom_name_1 posx posy posz (velx vely velz)
Atom_name_2 posx posy posz (velx vely velz)
...
Atom_name_nat posx posy posz (velx vely velz)

where the velocity information is needed for MD ('''TurboGAP''' will generate random velocities if not provided). The positions must be in units of Angstrom, the velocities in Angstrom/fs and the masses in amu. '''TurboGAP''' XYZ reading adheres strictly to extXYZ format, with "species" (S:1), "pos" (R:3), "vel" (R:3), "fix_atom" (S:3, with values F or T allowed) and "mass" (R:1) read from the Properties attribute. "positions", "velocities", "fix_atoms" and "masses" are used as synonyms for "pos", "vel", "fix_atom" and "mass", respectively.

=== Potential directory (gap_files/) ===

The [[Potentials|GAP potential files]] are usually put into a subdirectory under your working directory named <code>gap_files</code>. This subdirectory contains a bunch of files generated with QUIP's <code>gap_fit</code> program, with XML extension, as well as a mumber of other files. The XML files are often enough to run a [[Running a GAP calculation with QUIP|GAP calculation with QUIP]], with the notable exception of potentials with [[vdW corrections]], which might need some preprocessing before they can be used with QUIP. The <code>*.gap</code> file tells '''TurboGAP''' how to use the different files to run a GAP calculation. When using these files with '''TurboGAP''' you do not need to worry about preprocessing, they're ready to go.

=== Output files ===

Besides standard output (basic messages, progress bar, etc.) that you get printed to stdout, '''TurboGAP''' produces one or two output files, depending on whether you are running a static calculation (<code>turbogap predict</code>) or molecular dynamics (<code>turbogap md</code>). Output file <code>[[trajectory_out.xyz]]</code> is always written out, and it contains atomic positions, predicted energy and forces, etc. Output file <code>[[thermo.log]]</code> is only written when doing MD, and it contains basic thermodynamic information (energy, temperature, pressure, etc.). One can control the frequency with which each file is written (for MD only) with <code>[[write_xyz]]</code> and <code>[[write_thermo]]</code>, respectively. For finer control over which properties are written and which are not, refer to [[writeouts]].

== Parallel support ==

Parallel support in '''TurboGAP''' is provided specifically via MPI. To build the '''TurboGAP''' code with MPI support you need an MPI-enabled Fortran compiler. '''TurboGAP''' is routinely tested and works reliably with the <code>gfortran</code> MPI wrapper, usually called <code>mpif90</code>. It should also be possible to build '''TurboGAP''' with Intel's <code>mpifort</code>, but we do not usually test the code with it. Note that the BLAS/LAPACK libraries used by '''TurboGAP''' should be compiled with the same compiler suite used to build '''TurboGAP''', to ensure compatibility. Also note that OpenMP support can be available from BLAS/LAPACK. In that case, hybrid MPI/OpenMP '''TurboGAP''' execution can be achieved, although be mindful that OpenMP acceleration can only be exploited for energy and force evaluation, not descriptor construction. We recommend to run '''TurboGAP''' with exclusive MPI parallelization, and our tests showed MPI performance to be superior to hybrid MPI/OpenMP. Since system architecture and the details of the potential might affect said performance, run your own tests to evaluate whether you gain speed up from BLAS/LAPACK's threading support.

Monte-Carlo

2026-03-02T11:50:20Z

Tigany Zarrouk: /* Monte-Carlo options */

= Running Monte-Carlo =
To perform a (Grand-Canonical) Monte-Carlo simulation we must run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">turbogap mc
</syntaxhighlight>
The following files are written to every <code>write_xyz = N</code> steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

= Monte-Carlo Move Types =
We can specify a number of Monte-Carlo trial move types, including doing molecular dynamics (hybrid Monte-Carlo) as trial moves. These include
* Displacement
* Swap (to swap atoms in the simulation box)
* Volume (for NPT)
* Insertion (for mu VT)
* Removal (for mu VT)
* MD (for hybrid Monte-Carlo)

= Minimal Input File =
<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99

mc_nsteps = 5000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion

! Note: The following files are written to every write_xyz steps
! mc.log: self-explanatory,
! format: mc_step mc_move accepted E_trial E_current N_sites(trial)
! N_gcmc_species(trial)
! mc_all.xyz: an appended file which contains all accepted moves
! mc_trial.xyz: a single configuration which contains the trial move
! mc_current.xyz: a single configuration which contains the current accepted

write_xyz = 200
</pre>


= Monte-Carlo options =

{| class="wikitable"
|-
! Keyword
! Definition
! Optional
! Type
! Default
! Used when
! Example
|-
| <code>mc_nsteps</code>
| Number of MC steps
| N
| Int
| 0
| turbogap mc
| <code>mc_nsteps = 1000</code>
|-
| <code>n_mc_types</code>
| Number of MC types
| N
| Int
| 0
| turbogap mc
| <code>n_mc_types = 2</code>
|-
| <code>mc_types</code>
| Types of MC trials
| N
| Str(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_types = 'volume' 'move'</code>
|-
| <code>mc_acceptance</code>
| Ratios of MC trial moves
| N
| Int(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_acceptance = 2 1</code>
|-
| <code>n_mc_swaps</code>
| Number of swap pairs (e.g. C O is one pair)
| Y
| Int
| 0
| 'swap' in <code>mc_types</code>
| <code>n_mc_swaps = 2</code>
|-
| <code>mc_swaps</code>
| Swap species pairs
| Y
| Strs
| None
| 'swap' in <code>mc_types</code>
| <code>mc_swaps = 'C' 'O' 'N' 'C'</code>
|-
| <code>mc_move_max</code>
| Maximum displacement [A]
| Y
| Float
| 1.0
| 'move' in <code>mc_types</code>
| <code>mc_move_max = 0.5</code>
|-
| <code>mc_lnvol_max</code>
| Log volume max for volume moves.
| Y
| Float
| 0.01
| 'volume' in <code>mc_types</code>
| <code>mc_lnvol_max = 0.02</code>
|-
| <code>mc_min_dist</code>
| Minimum distance for insertion [A]
| Y
| Float
| 0.2
| 'insertion in <code>mc_types</code>
| <code>mc_min_dist = 0.1</code>
|-
| <code>mc_max_dist</code>
| Maximum distance for insertion [A]
| Y
| Float
| 10000000.0
| 'insertion in <code>mc_types</code>
| <code>mc_max_dist = 5.0</code>
|-
| <code>n_mc_mu</code>
| Number of chemical potentials/gcmc species
| Y
| Int
| 1
| 'insertion'/'removal' in <code>mc_types</code>
| <code>n_mc_mu = 2</code>
|-
| <code>mc_mu</code>
| Chemical potential(s) [eV]
| Y
| Float
| 0.0
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_mu = -5.16 -2.25</code>
|-
| <code>mc_species</code>
| GCMC species
| Y
| Str
| None
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_species = 'O' 'H'</code>
|-
| <code>mc_relax</code>
| Relax MC trials prior to acc. evaluation
| Y
| Bool
| .false.
| turbogap mc
| <code>mc_relax = .true.</code>
|-
| <code>n_mc_relax_after</code>
| Number of specific trials to relax
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>n_mc_relax_after = 1</code>
|-
| <code>mc_relax_after</code>
| Relax specific MC trial types
| Y
| Str(s)
| None
| <code>n_mc_relax_after > 0</code>
| <code>mc_relax_after = 'volume'</code>
|-
| <code>mc_nrelax</code>
| Number of relaxation steps after trial
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>mc_nrelax = 50</code>
|-
| <code>mc_relax_opt</code>
| Optimisation for relaxing after steps
| Y
| Str
| 'gd'
| <code>mc_relax = .true.</code>
| <code>mc_relax_opt = 'gd-box-ortho'</code>
|-
| <code>mc_hamiltonian</code>
| Use NVE for 'md' trials
| Y
| Bool
| .false.
| 'md' in <code>mc_types</code>
| <code>mc_hamiltonian = .true.</code>
|-
| <code>mc_hybrid_opt</code>
| Optimisation for 'md' steps
| Y
| Str
| 'vv'
| 'md' in <code>mc_types</code>
| <code>mc_hybrid_opt = 'vv'</code>
|-
| <code>mc_write_xyz</code>
| Debug: read and write to files every step
| Y
| Bool
| .false.
| <code>turbogap mc</code>
| <code>mc_write_xyz = .true.</code>
|}

Monte-Carlo

2025-01-06T11:44:14Z

Tigany Zarrouk: /* Monte-Carlo options */

= Running Monte-Carlo =
To perform a (Grand-Canonical) Monte-Carlo simulation we must run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">turbogap mc
</syntaxhighlight>
The following files are written to every <code>write_xyz = N</code> steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

= Monte-Carlo Move Types =
We can specify a number of Monte-Carlo trial move types, including doing molecular dynamics (hybrid Monte-Carlo) as trial moves. These include
* Displacement
* Swap (to swap atoms in the simulation box)
* Volume (for NPT)
* Insertion (for mu VT)
* Removal (for mu VT)
* MD (for hybrid Monte-Carlo)

= Minimal Input File =
<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99

mc_nsteps = 5000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion

! Note: The following files are written to every write_xyz steps
! mc.log: self-explanatory,
! format: mc_step mc_move accepted E_trial E_current N_sites(trial)
! N_gcmc_species(trial)
! mc_all.xyz: an appended file which contains all accepted moves
! mc_trial.xyz: a single configuration which contains the trial move
! mc_current.xyz: a single configuration which contains the current accepted

write_xyz = 200
</pre>


= Monte-Carlo options =

{| class="wikitable"
|-
! Keyword
! Definition
! Optional
! Type
! Default
! Used when
! Example
|-
| <code>mc_nsteps</code>
| Number of MC steps
| N
| Int
| 0
| turbogap mc
| <code>mc_nsteps = 1000</code>
|-
| <code>n_mc_types</code>
| Number of MC types
| N
| Int
| 0
| turbogap mc
| <code>n_mc_types = 2</code>
|-
| <code>mc_types</code>
| Types of MC trials
| N
| Str(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_types = 'volume' 'move'</code>
|-
| <code>mc_acceptance</code>
| Ratios of MC trial moves
| N
| Int(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_acceptance = 2 1</code>
|-
| <code>n_mc_swaps</code>
| Number of swap pairs (e.g. C O is one pair)
| Y
| Int
| 0
| 'swap' in <code>mc_types</code>
| <code>n_mc_swaps = 2</code>
|-
| <code>mc_swaps</code>
| Swap species pairs
| Y
| Strs
| None
| 'swap' in <code>mc_types</code>
| <code>mc_swaps = 'C' 'O' 'N' 'C'</code>
|-
| <code>mc_move_max</code>
| Maximum displacement [A]
| Y
| Float
| 1.0
| 'move' in <code>mc_types</code>
| <code>mc_move_max = 0.5</code>
|-
| <code>mc_lnvol_max</code>
| Log volume max for volume moves.
| Y
| Float
| 0.01
| 'volume' in <code>mc_types</code>
| <code>mc_lnvol_max = 0.02</code>
|-
| <code>mc_min_dist</code>
| Minimum distance for insertion [A]
| Y
| Float
| 0.2
| 'insertion in <code>mc_types</code>
| <code>mc_min_dist = 0.1</code>
|-
| <code>n_mc_mu</code>
| Number of chemical potentials/gcmc species
| Y
| Int
| 1
| 'insertion'/'removal' in <code>mc_types</code>
| <code>n_mc_mu = 2</code>
|-
| <code>mc_mu</code>
| Chemical potential(s) [eV]
| Y
| Float
| 0.0
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_mu = -5.16 -2.25</code>
|-
| <code>mc_species</code>
| GCMC species
| Y
| Str
| None
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_species = 'O' 'H'</code>
|-
| <code>mc_relax</code>
| Relax MC trials prior to acc. evaluation
| Y
| Bool
| .false.
| turbogap mc
| <code>mc_relax = .true.</code>
|-
| <code>n_mc_relax_after</code>
| Number of specific trials to relax
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>n_mc_relax_after = 1</code>
|-
| <code>mc_relax_after</code>
| Relax specific MC trial types
| Y
| Str(s)
| None
| <code>n_mc_relax_after > 0</code>
| <code>mc_relax_after = 'volume'</code>
|-
| <code>mc_nrelax</code>
| Number of relaxation steps after trial
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>mc_nrelax = 50</code>
|-
| <code>mc_relax_opt</code>
| Optimisation for relaxing after steps
| Y
| Str
| 'gd'
| <code>mc_relax = .true.</code>
| <code>mc_relax_opt = 'gd-box-ortho'</code>
|-
| <code>mc_hamiltonian</code>
| Use NVE for 'md' trials
| Y
| Bool
| .false.
| 'md' in <code>mc_types</code>
| <code>mc_hamiltonian = .true.</code>
|-
| <code>mc_hybrid_opt</code>
| Optimisation for 'md' steps
| Y
| Str
| 'vv'
| 'md' in <code>mc_types</code>
| <code>mc_hybrid_opt = 'vv'</code>
|-
| <code>mc_write_xyz</code>
| Debug: read and write to files every step
| Y
| Bool
| .false.
| <code>turbogap mc</code>
| <code>mc_write_xyz = .true.</code>
|}

Creating oxygenated amorphous carbon

2024-11-14T13:22:51Z

Tigany Zarrouk: /* GCMC in TurboGAP */

This tutorial will focus on using molecular dynamics and Grand-Canonical Monte-Carlo (GCMC) simulations to determine equilibrium structures of oxygenated amorphous carbon using '''TurboGAP'''.

This tutorial is found on the '''TurboGAP''' wiki: turbogap.fi (-> tutorials -> Creating oxygenated amorphous carbon) https://turbogap.fi/wiki/index.php/Creating_oxygenated_amorphous_carbon.

The structure of this tutorial as as follows:

# Create an amorphous carbon structure using molecular dynamics, via a melt-quench procedure.
# Perform a standard GCMC calculation to populate the structure with oxygen.
# Perform a hybrid Monte-Carlo/MD simulations, to relax the system.


= Introduction =


== What is '''TurboGAP'''? ==

TurboGAP is a code used to simulate Machine-Learned Potentials, specifically, Gaussian Approximation Potentials.

It has numerous selling points:

# It is ''fast''.
#* It uses '''soap turbo''' descriptors, which are both faster and more accurate than your typical SOAP expansion (also found in QUIP). See the original paper paper by Miguel Caro for more details: [https://doi.org/10.1103/PhysRevB.100.024112 Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials]
#* MPI parallelised, (overlapping domain decomposition currently being developed with help of CSC).
#* GPU implementation in progress (with CSC support too).
# It can perform not just typical molecular statics (with/without box relaxation) and dynamics (<math display="inline">NVT</math> / <math display="inline">NPT</math>), it can perform ''Grand-Canonical Monte-Carlo'' simulations.
#* Grand-Canonical Monte Carlo (<math display="inline">\mu VT</math>), with (<math display="inline">NVT</math>) / (<math display="inline">NPT</math>) move types available.
#* Adaptive time-scale MD (by Uttiyoarnab Saha).
# Prediction of an arbitrary number of local properties.
#* ML Van der Waals (by prediction of local hirshfeld volumes) using Tkatchenko-Scheffler [https://doi.org/10.1103/PhysRevB.104.054106 Machine learning force fields based on local parametrization of dispersion interactions]
#* Heikki Muhli has developed ''Many-Body Dispersion'' capability, with multiple optimisations.
#* Max Veit is developing ''electrostatics''.
# Sneak Peek!: we have added the capability to predict/simulate numerous types of experimental data (ML XPS/XRD) and can allow them to influence simulation. (Talk to Tigany Zarrouk/look out for the papers when they come out on arXiv)!


== Installing TurboGAP ==


=== For the MLIP workshop ===

If you have a CSC account and can ssh into Mahti/Puhti there is no need to install anything. TurboGAP is installed in the path (on Mahti / Puhti)

<syntaxhighlight lang="bash">/projappl/project_2008666/turbogap
</syntaxhighlight>
The tutorial is in

<syntaxhighlight lang="bash">cd /projappl/project_2008666/turbogap/tutorials/creating_a-COx
</syntaxhighlight>
Copy the directory <code>creating_a-COx</code> to wherever you want to do the simulations and then check the project in <code>creating_a-COx/sample_submit_script.sh</code>, change it from

<syntaxhighlight lang="bash">#SBATCH --account=project_
</syntaxhighlight>
to

<syntaxhighlight lang="bash">#SBATCH --account=project_2008666
</syntaxhighlight>
This tutorial depends on <code>ase</code>, so install it as so (after loading the python module)

<syntaxhighlight lang="bash">module load python-data/3.9-22.04
pip install ase --user
</syntaxhighlight>
Each of the simulations should take ~5-8 minutes.


=== Installation ===

To install TurboGAP please run

<syntaxhighlight lang="bash">git clone --recursive http://github.com/mcaroba/turbogap.git /your/turbogap/source/directory
</syntaxhighlight>
Where /your/turbogap/source/directory is the directory where you're putting the TurboGAP source code. To build the TurboGAP binary and library, you need to select the options that best match your architecture, by editing this line in the Makefile with one of the names of the corresponding makefiles in turbogap/makefiles:

<syntaxhighlight lang="bash">include makefiles/Makefile.Ubuntu_gfortran_mpi
</syntaxhighlight>
Then just run <code>make</code>

<syntaxhighlight lang="bash">make
</syntaxhighlight>
Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to wherever you want to do the simulations.


=== Running this tutorial on a cluster ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Change <code>sample_submit_script.sh</code> to reflect the type of job scheduler you use (here, it's slurm), the modules you've loaded for <code>turbogap</code> and python, and change the <code>PATH</code> environment variable to where you've installed <code>turbogap/bin</code>.
* Make sure the project account is correct!
* Load the python module you will use, and make sure <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Change the <code>srun turbogap</code> commands in the <code>script_*.sh</code> files to the standard for your cluster (e.g. <code>mpirun -np $N turbogap</code>).
* In each of the directories enumerated with "1.,2., etc", run the scripts in order after the preceding job has finished. They are enumerated with "1.,2., etc" with bash, e.g. <code>bash 1.run_randomise.sh</code>.


=== Running this tutorial locally ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Make sure the python package <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Edit the convenience script <code>change_to_local.sh</code> and run with bash.
* Change the path environment variable in <code>sample_submit_script.sh</code>.


== Note: how to make a potential work in '''TurboGAP''' ==

You must convert potentials which are trained from [https://github.com/libatoms/QUIP libAtoms] (the xml files) to <code>*.gap</code> files. It can be run by

<syntaxhighlight lang="bash">python3 /path/turbogap/tools/quip_to_xml/make_gap_files.py your_potential.xml your_potential.gap {your_hirshfeld.xml}
</syntaxhighlight>

= Tutorial =


== Create Amorphous Carbon ==

Here, we perform molecular dynamics simulations to form amorphous carbon from diamond. To do this, we use a simple melt-quench procedure, which is modified from the paper of Wang ''et al.'' to create amorphous carbon [https://doi.org/10.1021/acs.chemmater.1c03279 Structure and Pore Size Distribution in Nanoporous Carbon].

This is also similar to what is done in other tutorials [https://turbogap.fi/wiki/index.php/Graphitization_simulation_with_van_der_Waals_corrections Graphitization simulation with van der Waals corrections] [https://turbogap.fi/wiki/index.php/Generating_amorphous_silicon_from_quenching_simulations Generating amorphous silicon from quenching simulations].

The procedure we will follow is:

# We heat up the diamond to 9000K, thereby randomizing the structure.
# We quench to 1000K (actual temp used is 3500K in the real paper).
# We anneal (partially graphitize) the structure at 1000K, to allow the carbon bonds a chance to reform.
# Relax the structure, allowing both atomic positions and cell vectors to relax.

To run these calculations, do

<syntaxhighlight lang="bash">cd 1.make_amorphous_carbon
bash 1.run_randomise.sh
# Wait for it to finish

bash 2.run_quench.sh
# Wait for it to finish

bash 3.run_anneal.sh
# Wait for it to finish

bash 4.run_relax.sh
# Wait for it to finish
</syntaxhighlight>

=== 1. Randomise ===

First, we create a diamond structure using ASE, changing the volume to achieve a given density.

<syntaxhighlight lang="bash">sim_name="md_diamond_randomise"

input_atoms="atoms.xyz"
output_atoms="atoms_randomise.xyz"

ln -sf ../gap_files ./

# 1. Create diamond structure (1000 atoms in atoms.xyz file)
echo "> Running: python create_diamond.py"
python3 create_diamond.py

cp diamond.xyz $input_atoms
</syntaxhighlight>
Note: In <code>create_diamond.py</code> we make 1000 atoms. You can change this to a smaller number, if you want things to run faster.

<syntaxhighlight lang="python">atoms *= (3,3,3)
</syntaxhighlight>
Then we create the input file for '''TurboGAP''' in <code>input</code>.

<pre class="conf">! Species-specific info
atoms_file = '${input_atoms}' ! Input file
pot_file = 'gap_files/CO.gap' ! path to gap_files
n_species = 2 ! > Actually the number of species in atoms.xyz is 1 (C),
! but we will add oxygen in future simulations
species = C O
masses = 12.01 15.99

! MD options
md_nsteps = 5000 ! Number of MD steps (5ps randomise - actual time in paper is 20ps)
md_step = 1 ! MD timestep [fs]
thermostat = bussi ! Either bussi / berendsen

t_beg = 9000 ! Initial temperature [K]
t_end = 9000 ! Final temperature [K]
tau_t = 100. ! Time constant [fs]

! Output
write_thermo = 1 ! Write thermodynamic information every step
! (Step, Time, Temp, Kin_E, Pot_E, Pres)

write_xyz = 200 ! Write extended xyz trajectory (trajectory_out.xyz) every 200 steps

! > Predicted local properties are in the xyz, such as the local energy
! and if specified, hirshfeld volumes, core electron binding energies etc
</pre>
We run MD/relaxation simulations using the <code>md</code> option

<syntaxhighlight lang="bash">srun turbogap md
</syntaxhighlight>
This simulation outputs a few files:

# <code>trajectory_out.xyz</code> is the extended xyz of the trajectory, written every writexyz steps.
# <code>thermo.log</code> logs the MD simulation


=== 2. Quench ===

Here, we cool the system.

The only things necessary to change in the input file (if you rebel against using the provided scripts) is the temperature and the number of MD steps.

<pre class="conf">! MD options
md_nsteps = 5500 ! 5.5ps quench
md_step = 1.
thermostat = bussi
t_beg = 9000 ! Quenching from 9000K
t_end = 1000 ! to 1000K
</pre>

=== 3. Anneal ===

We can anneal the structure to graphitize. The method demonstrated here is different from that of the paper (we are hardly graphitising at 1000K and 10ps), but its here to illustrate the use of a barostat.

The necessary changes to the input file are

<pre class="conf">! MD options
md_nsteps = 10000 ! Graphitization (actual time in the paper is 200ps)
md_step = 1.
barostat = berendsen ! Using barostat
t_beg = 1000
t_end = 1000
p_beg = 1.0 ! Initial pressure [bar]
p_end = 1.0 ! Final pressure [bar]
tau_t = 100.
</pre>

=== 4. Relax ===

Now we relax the structure. We can choose multiple options for this, but here we opt for relaxing the box and the positions.

<pre class="conf">! MD options
optimize = gd ! optimize option allows us to specify the type of relaxation
! > Can use "gd" (gradient descent)
! "gd-box-ortho" (gradient descent relaxing diagonal cell vector components)
! "gd-box" (gradient-decent relaxing all cell vector components)

! e_tol = 1.d-6 ! Default energy/force tolerances used
! f_tol = 0.010
md_nsteps = 2000
</pre>
We can now look at the simulation results by running

<syntaxhighlight lang="bash">cat traj_* > all_traj.xyz
</syntaxhighlight>
and using your atom viewer of choice.

At the end, you should have a structure which is similar to this.

[[File:amorphous_carbon.png]]

You can look at the local energies predicted by the GAP (here I use <code>ovito</code>).

* Open structure in <code>ovito</code>
* Use modifier "Create Bonds"
* Use modifier "Color Coding"
** Change "Input Property" to "Potential Energy" (it might also be called "local energy").

[[File:amorphous_carbon_local_energy.png]]


== Perform Standard Grand-Canonical Monte-Carlo (GCMC) ==


=== Theory of GCMC ===

In a GCMC simulation, a system of interest is at fixed volume, allowed to thermalize by contact with a heat bath, and it can exchange particles with an infinite reservoir, forming a constant (<math display="inline">\mu,V,T</math>) ensemble.

We perform GCMC using a Markov Chain: starting from an initial pure a-C<math display="inline">_x</math> structure, we generate trial configurations by either randomly displacing a particular atom, or inserting/removing oxygen into/from a random position respectively. These trial configurations are either accepted or rejected using the standard acceptance criteria (see Frenkel 2002) for particle displacement/insertion/removal

<math display="block"> \mathrm{acc}( \mathrm{move}) = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(\mathrm{trial}) - E(\mathrm{current}) ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\biggl[ 1, \frac{V}{\lambda^3 (N+1)} \exp\left\{ - \beta ( E(N+1) - E(N) - \mu ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\biggl[1, \frac{\lambda^3 N}{V} \exp\left\{ -\beta ( E(N-1) - E(N) + \mu ) \right\} \biggr] </math>

where <math display="inline">\lambda</math> is the thermal de-Broglie wavelength which is given by <math display="inline">\lambda = \sqrt{\frac{2\pi \hslash^2}{mk_BT}}</math>. We then repeat the procedure with the last accepted configuration until the the maximum number of iterations has been reached.


=== GCMC in '''TurboGAP''' ===

Now we perform Grand-Canonical Monte-Carlo to obtain an oxygenated amorphous carbon structure. This is a <math display="inline">\mu VT</math> ensemble, hence we must specify the chemical potential, <math display="inline">\mu</math>, and the species which we want to insert: here, just oxygen.

The format is similar to above, but there are more options:

<pre class="conf">mc_nsteps = 10000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion
</pre>
To see all the options for Monte-Carlo, visit [https://turbogap.fi/wiki/index.php/Monte-Carlo Monte-Carlo].

To run it, we run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">srun turbogap mc
</syntaxhighlight>
To run these calculations, do

<syntaxhighlight lang="bash">cd ../2.standard_gcmc
bash 1.run_gcmc.sh
</syntaxhighlight>
The following files are written to every writexyz steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

You should find a structure which looks like this (the last configuration in <code>mc_all.xyz</code>).

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc.png]]

</div>
Here, we use a chemical potential of <math display="inline">\mu = 0.0</math> eV. You can experiment with different chemical potentials. Using <math display="inline">\mu =
-5.16</math> eV is related to half the binding energy of O2 at 300K and 1atm. Try it for yourself (in your own time)!

For this short simulation and this size of box, we will not reach convergence of the oxygen content (MC simulation steps are on the order of 105 - 106 steps). We can run

<syntaxhighlight lang="bash">module load python-data/3.9
python analyse_O_content.py
</syntaxhighlight>
to see the oxygen content.

<div class="figure">

[[File:simple_O_content_monitor.png]]

</div>
Here, we can see the local energy for the GCMC configuration.

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc_local_energy.png]]

</div>


== Hamiltonian Monte-Carlo ==

Observing the local energies above, we notice that there are some rather high values. To relax them using MC, we can use Hamiltonian MC, which uses the results of NVE molecular dynamics as trial configurations for MC displacements. This gives very high acceptance rates in comparison to other move types due to the energy being approximately conserved from the symplectic (velocity verlet) integrator (hence your acceptance criterion will be approximately 1).

<pre class="conf">md_nsteps = 20 ! specifying the number of steps for velocity verlet
md_step = 0.1 ! 0.1fs timestep

mc_nsteps = 50 ! Number of mc trial steps to be performed
n_mc_types = 1
mc_types = 'md'
mc_acceptance = 1 ! Relative rate of choosing the respective trial moves
mc_hamiltonian = .true. ! For Hamiltonian Monte-Carlo
! (NVE ensemble used to increase trial acceptance) for md
! move type
</pre>

<syntaxhighlight lang="bash">cd ../3.hamiltonian_mc
bash 1.run_hamiltonian_mc.sh
</syntaxhighlight>

== <math display="inline">NPT</math> using volume moves ==

We can do constant pressure Monte-Carlo by doing volume moves. In fact, we can mix volume moves and MD <math display="inline">NPT</math> moves to create trial configurations, if we so desire.

Looking at the <code>input</code> file, we see that we've specified a volume type MC move in which the acceptance criterion is given by <math display="block"> \mathrm{acc}(V \rightarrow V') = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(V') - E(V) + P(V-V') - (N+1)\ln (V/V')/\beta ) \right\} \biggr] </math>

<pre class="conf">mc_nsteps = 200

n_mc_types = 3
mc_types = 'move' 'volume' 'md' ! We now specify an MD move type for doing NPT

mc_acceptance = 1 1 1
mc_move_max = 0.2

mc_lnvol_max = 0.01 ! Maximum lnvol to modify the volume

! Specify MD configuration
t_beg = 300
t_end = 300
p_beg = 1.0
p_end = 1.0

md_nsteps = 30
md_step = 0.1 ! Reducing the timestep
barostat = berendsen
tau_t = 100.
</pre>
To run it we do

<syntaxhighlight lang="bash">cd ../4.volume_mc
bash 1.run_volume.sh
</syntaxhighlight>
This gives an expansion of the volume.

Monte-Carlo

2024-07-17T08:56:11Z

Tigany Zarrouk: /* Monte-Carlo options */

= Running Monte-Carlo =
To perform a (Grand-Canonical) Monte-Carlo simulation we must run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">turbogap mc
</syntaxhighlight>
The following files are written to every <code>write_xyz = N</code> steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

= Monte-Carlo Move Types =
We can specify a number of Monte-Carlo trial move types, including doing molecular dynamics (hybrid Monte-Carlo) as trial moves. These include
* Displacement
* Swap (to swap atoms in the simulation box)
* Volume (for NPT)
* Insertion (for mu VT)
* Removal (for mu VT)
* MD (for hybrid Monte-Carlo)

= Minimal Input File =
<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99

mc_nsteps = 5000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion

! Note: The following files are written to every write_xyz steps
! mc.log: self-explanatory,
! format: mc_step mc_move accepted E_trial E_current N_sites(trial)
! N_gcmc_species(trial)
! mc_all.xyz: an appended file which contains all accepted moves
! mc_trial.xyz: a single configuration which contains the trial move
! mc_current.xyz: a single configuration which contains the current accepted

write_xyz = 200
</pre>


= Monte-Carlo options =

{| class="wikitable"
|-
! Keyword
! Definition
! Optional
! Type
! Default
! Used when
! Example
|-
| <code>mc_nsteps</code>
| Number of MC steps
| N
| Int
| 0
| turbogap mc
| <code>mc_nsteps = 1000</code>
|-
| <code>n_mc_types</code>
| Number of MC types
| N
| Int
| 0
| turbogap mc
| <code>n_mc_types = 2</code>
|-
| <code>mc_types</code>
| Types of MC trials
| N
| Str(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_types = 'volume' 'move'</code>
|-
| <code>mc_acceptance</code>
| Ratios of MC trial moves
| N
| Int(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_acceptance = 2 1</code>
|-
| <code>n_mc_swaps</code>
| Number of swaps
| Y
| Int
| 0
| 'swap' in <code>mc_types</code>
| <code>n_mc_swaps = 2</code>
|-
| <code>mc_swaps</code>
| Swap species
| Y
| Strs
| None
| 'swap' in <code>mc_types</code>
| <code>mc_swaps = 'C' 'O' 'N' 'C'</code>
|-
| <code>mc_move_max</code>
| Maximum displacement [A]
| Y
| Float
| 1.0
| 'move' in <code>mc_types</code>
| <code>mc_move_max = 0.5</code>
|-
| <code>mc_lnvol_max</code>
| Log volume max for volume moves.
| Y
| Float
| 0.01
| 'volume' in <code>mc_types</code>
| <code>mc_lnvol_max = 0.02</code>
|-
| <code>mc_min_dist</code>
| Minimum distance for insertion [A]
| Y
| Float
| 0.2
| 'insertion in <code>mc_types</code>
| <code>mc_min_dist = 0.1</code>
|-
| <code>n_mc_mu</code>
| Number of chemical potentials/gcmc species
| Y
| Int
| 1
| 'insertion'/'removal' in <code>mc_types</code>
| <code>n_mc_mu = 2</code>
|-
| <code>mc_mu</code>
| Chemical potential(s) [eV]
| Y
| Float
| 0.0
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_mu = -5.16 -2.25</code>
|-
| <code>mc_species</code>
| GCMC species
| Y
| Str
| None
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_species = 'O' 'H'</code>
|-
| <code>mc_relax</code>
| Relax MC trials prior to acc. evaluation
| Y
| Bool
| .false.
| turbogap mc
| <code>mc_relax = .true.</code>
|-
| <code>n_mc_relax_after</code>
| Number of specific trials to relax
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>n_mc_relax_after = 1</code>
|-
| <code>mc_relax_after</code>
| Relax specific MC trial types
| Y
| Str(s)
| None
| <code>n_mc_relax_after > 0</code>
| <code>mc_relax_after = 'volume'</code>
|-
| <code>mc_nrelax</code>
| Number of relaxation steps after trial
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>mc_nrelax = 50</code>
|-
| <code>mc_relax_opt</code>
| Optimisation for relaxing after steps
| Y
| Str
| 'gd'
| <code>mc_relax = .true.</code>
| <code>mc_relax_opt = 'gd-box-ortho'</code>
|-
| <code>mc_hamiltonian</code>
| Use NVE for 'md' trials
| Y
| Bool
| .false.
| 'md' in <code>mc_types</code>
| <code>mc_hamiltonian = .true.</code>
|-
| <code>mc_hybrid_opt</code>
| Optimisation for 'md' steps
| Y
| Str
| 'vv'
| 'md' in <code>mc_types</code>
| <code>mc_hybrid_opt = 'vv'</code>
|-
| <code>mc_write_xyz</code>
| Debug: read and write to files every step
| Y
| Bool
| .false.
| <code>turbogap mc</code>
| <code>mc_write_xyz = .true.</code>
|}

XPS-optimized oxygenated amorphous carbon

2024-06-19T10:26:09Z

Tigany Zarrouk: /* Motivation for structural inference */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]

We will do simulations similar to the above, which is from our JACS paper ([[#citeproc_bib_item_11|Zarrouk et al. 2024]]).


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T10:25:47Z

Tigany Zarrouk: /* Motivation for structural inference */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]
We will do simulations similar to the above, which is from our JACS paper ([[#citeproc_bib_item_11|Zarrouk et al. 2024]]).


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T10:25:33Z

Tigany Zarrouk:

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]
We will do simulations similar to the above, is from our JACS paper ([[#citeproc_bib_item_11|Zarrouk et al. 2024]]).


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T10:24:26Z

Tigany Zarrouk: /* Motivation for structural inference */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]
We will do simulations similar to the above, is from our JACS paper.


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T09:01:37Z

Tigany Zarrouk: /* Experimental Observable Optimization */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]

We can observe the significant effect of the energy scale here. The experimental energy decreases quickly, with a concurrent increase in the energy per atom. After spectral agreement, the system is allowed to relax.



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T08:59:34Z

Tigany Zarrouk: /* (Grand-Canonical) Monte-Carlo */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]

We see there is a large amount of oxygen inserted due to the large chemical potential for this simulation. This shifts the xps to large binding core electron binding energies, and there is no agreement the final structure with the experimental XPS.



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T08:57:57Z

Tigany Zarrouk: /* Experimental Observable prediction/optimization */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

We see in the experimental deconvolution analysis that sp3 motifs compose the primary XPS peak. The generation of a-COx here was done by physical vapor deposition of graphitic carbon in an oxygen environment. They see a reduction in the number of sp2 bonds. They also claim there is a complete transformation of sp2 to sp3 that is reversible with a short annealing at 500C.

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

However from our deconvolution analysis, we see that this primary peak is composed of sp2 motifs. We can explain why the experimentalists see only sp2 with a simple computational experiment.

We add in oxygen to the bottom layer of a graphene bilayer system to see the influence of oxygen on sp2 core electron binding energies.

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. Due to the presence of oxygen, there is a shift of sp2 motifs which are not accounted for by the experimental deconvolution references, as these are fixed. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-19T08:22:52Z

Tigany Zarrouk: /* SOAP (many-body) descriptor */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])



== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:44:54Z

Tigany Zarrouk: /* Motivation for structural inference */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:43:11Z

Tigany Zarrouk: /* Deconvolution analysis */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png | 400px]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png| 400px]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png| 400px]]

[[File:xps_shifts_miguel_last.png| 400px]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:42:27Z

Tigany Zarrouk: /* What is TurboGAP ? */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).



= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:42:02Z

Tigany Zarrouk: /* Experimental Observable Optimization */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png | 800px]]



== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:41:43Z

Tigany Zarrouk: /* (Grand-Canonical) Monte-Carlo */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png | 800px]]



== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:41:19Z

Tigany Zarrouk: /* How do we marry experimental data and GAP? */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png | 400px]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>



= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:41:04Z

Tigany Zarrouk: /* Motivation for structural inference */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png | 400px]]



= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:39:08Z

Tigany Zarrouk: /* How do we model XPS? */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png]]


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.



== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:38:33Z

Tigany Zarrouk: /* Experimental Observable prediction/optimization */

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png]]


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.

[[File:miguel_xps_diagram.png]]

↑ Diagram of XPS (x-axis is core electron binding energy) ([[#citeproc_bib_item_6|Golze et al. 2022]])


== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]

We see with oxygen content that sp2 motifs are shifted significantly, even past the sp3 reference. This is why the experimental deconvolution did not give what we expect: sp2 carbon that composes the primary XPS peak.


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

File:Xps shifts miguel last.png

2024-06-18T19:36:07Z

Tigany Zarrouk:

File:Xps shifts miguel first.png

2024-06-18T19:35:52Z

Tigany Zarrouk:

File:Exp xps with background high.png

2024-06-18T19:35:23Z

Tigany Zarrouk:

File:Xps gcmc scheme.png

2024-06-18T19:34:58Z

Tigany Zarrouk:

File:Miguel xps diagram.png

2024-06-18T19:34:30Z

Tigany Zarrouk:

File:Deconvolution comparison fill -3 new.png

2024-06-18T19:34:04Z

Tigany Zarrouk:

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:33:39Z

Tigany Zarrouk:

= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png]]


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.

[[File:miguel_xps_diagram.png]]

↑ Diagram of XPS (x-axis is core electron binding energy) ([[#citeproc_bib_item_6|Golze et al. 2022]])


== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]


=== Reduced graphene oxide - Exp. prediction ===

([[#citeproc_bib_item_5|El-Machachi et al. 2024]])


=== Grand-Canonical Monte-Carlo - Exp. optimization ===


=== Molecular Augmented Dynamics ('''MAD''') - MD with ''Experimental Forces'' ===


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:32:24Z

Tigany Zarrouk:

= Table of Contents =
# Introduction
# Theory/Methods
# How to use ''TurboGAP''
## Generalities
## Predicting core electron binding energies
## Optimizing an XPS spectrum
## Optimizing Multiple Observables
# Applications
## GCMC optimization of C1s XPS spectrum
## Molecular Augmented Dynamics optimization


= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:deconvolution_comparison_fill_-3_new.png]]


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.

[[File:miguel_xps_diagram.png]]

↑ Diagram of XPS (x-axis is core electron binding energy) ([[#citeproc_bib_item_6|Golze et al. 2022]])


== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:xps_shifts_miguel_first.png]]

[[File:xps_shifts_miguel_last.png]]


=== Reduced graphene oxide - Exp. prediction ===

([[#citeproc_bib_item_5|El-Machachi et al. 2024]])


=== Grand-Canonical Monte-Carlo - Exp. optimization ===


=== Molecular Augmented Dynamics ('''MAD''') - MD with ''Experimental Forces'' ===


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

File:Xps optimization gcmc.png

2024-06-18T19:28:02Z

Tigany Zarrouk:

File:Standard gcmc.png

2024-06-18T19:27:07Z

Tigany Zarrouk:

XPS-optimized oxygenated amorphous carbon

2024-06-18T19:25:55Z

Tigany Zarrouk: Created page with " = Table of Contents = Find this tutorial on https://turbogap.fi/ * (Tutorials → XPS optmization with Augmented GCMC) * ''TurboGAP'' g..."

= Table of Contents =

Find this tutorial on https://turbogap.fi/

* (Tutorials → XPS optmization with Augmented GCMC)
* ''TurboGAP'' git repo https://github.com/mcaroba/turbogap, with files in <code>turbogap/tutorials/xps_optimization/</code>

# Introduction
# Theory/Methods
# How to use ''TurboGAP''
## Generalities
## Predicting core electron binding energies
## Optimizing an XPS spectrum
## Optimizing Multiple Observables
# Applications
## GCMC optimization of C1s XPS spectrum
## Molecular Augmented Dynamics optimization
## Carbon materials
## Catalysis


= What is ''TurboGAP'' ? =

* ''TurboGAP'' is a <code>Fortran</code> atomistic code which uses Gaussian Approximation Potentials (GAPs).
* Its specialty are ''SOAP turbo'' descriptors: faster and more accurate than SOAP (Smooth Overlap of Atomic Positions).
* It's (probably!) the fastest SOAP prediction code, which will be made even faster with ''GPU acceleration'' (ongoing with CSC).

[[File:images/Turbogap_logo.png]]

* Through GAP/SOAP predictions, one can perform simulations with ''ab-initio'' accuracy '''at scale'''
** Molecular Dynamics
** Grand-Canonical Monte-Carlo simulations
** Electronic-stopping and spatially-correlated Langevin Dynamics (in-development).
** '''Prediction/optimization of experimental observables''' (''e.g.'' X-Ray photoelectron spectra ([[#citeproc_bib_item_6|Golze et al. 2022]]; [[#citeproc_bib_item_11|Zarrouk et al. 2024]]), X-ray/Neutron Diffraction and more to come!).


= Motivation for structural inference =

* We typically want to predict structures, to make sense of experimental data.
* We usually pray structures have experimental agreement (a “bottom-up” approach).
* What if we can promote agreement with experimental data?
* ''Reverse Monte-Carlo'' techniques match experimental data to atomic structure (“top-down” approach).
** Downsides: “By-hand” constraints
** Simple experimental observables.
** Unphysical structures/artifacts ([[#citeproc_bib_item_8|Opletal et al. 2017]])

Need efficient structure search to find '''agreement''' with '''experimental''' data ''and'' '''ab-inito''' data.

# With CO GAP + Core electron binding energy SOAP model we will perform '''modified (Grand-canonical) Monte-Carlo''' to produce ''experimentally consistent, low energy structures'' of oxygenated amorphous carbon.
# I will further show preliminary work on *“Molecular Augmented Dynamics”*: generate forces from experimental deviation, to match experimental spectra via MD.

[[File:images/deconvolution_comparison_fill_-3_new.png]]


= Theory and Methods =


== Gaussian Approximation Potentials (GAP) ==

([[#citeproc_bib_item_2|Bartók et al. 2010]]; [[#citeproc_bib_item_1|Bartók, Kondor, and Csányi 2013]]; [[#citeproc_bib_item_4|Deringer et al. 2021]])

* Assumption: Total energy is sum of '''local energies'''. <math display="inline">E_{\rm total}(\{\mathbf{r}\}) = \sum_i^{N_{\text{atoms}}} \varepsilon^i(\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i))</math>

<math display="block">\begin{align}
\varepsilon^i ( &\{r\}, \{\xi^{\text{2b}}_i\}, \{\boldsymbol{\xi}^{\text{3b}}_i\}, \boldsymbol{\xi}^{\text{mb}}_i) =
E^{\text{2b}}(\{\xi^{\text{2b}}_i\}) + E^{\text{3b}}(\{\boldsymbol{\xi}^{\text{3b}}_i\}) \nonumber\\
& + E^{\text{mb}}(\boldsymbol{\xi}^{\text{mb}}_i)
+ \sum_{r < r_{\rm cut}^{\rm core}} E^{\text{core}}(r) \nonumber
\end{align}</math>

; Local energy
: Sum of '''2b, 3b and many-body''' terms.
; Descriptor <math display="inline">\xi / \boldsymbol{\xi}</math>.
: ''e.g'' <math display="inline">\{\xi^{\text{2b}}\} = \{r_{ij}\}</math>, <math display="inline">\{\boldsymbol{\xi}^{\text{3b}}\} = \{ [ r_{ij}, r_{ik}, r_{kj} ] \}</math>
; Local environment
: A set of descriptors.

[[File:images/descriptors_deringer.png]]


=== Using descriptors to predict quantities ===

* ''Kernel/basis functions'' <math display="inline">k_{ij}(\boldsymbol{\xi}_i, \boldsymbol{\xi}_j)</math> measure the ''similarity between descriptors''.
* ''e.g'' Exponential 2b kernel: <math display="inline">k^{2b}_{ij}(\xi_i, \xi_j) \equiv k^{2b}_{ij}(r_{i}, r_{j}) = \exp( - (r_{i} - r_{j})^2/2\sigma^2 )</math>
* ''Gaussian Process Regression'' (kernel matrix algebra on training set) → ''fitting coefficients'' <math display="inline">\{\alpha\}</math>, ([[#citeproc_bib_item_4|Deringer et al. 2021]])
* ''e.g.'' 2b model trained with <math display="inline">N_s</math> descriptors <math display="inline">\{\xi\} \in \{\xi_i, \ldots, \xi_{Ns}\}</math>, the energy for atom with <math display="inline">M</math> pairwise distances in local environment: <math display="block">E^{2b} = \sum_i^M \left(E^{2b}_0 + \delta^2\sum_j^{N_s} \alpha_j k^{2b}_{ij}(\xi_i, \xi_j) \right) </math>


=== SOAP (many-body) descriptor ===

[[File:images/gap_expansion_diagram.png]]

* SOAP descriptor: <math display="inline">\boldsymbol{\xi}_i(\{\mathbf{r}\}_{ < r_{\rm cut}} )</math>, represents environment of an atom ([[#citeproc_bib_item_2|Bartók et al. 2010]]).
* Kernel: <math display="inline">k_{ij} = (\boldsymbol{\xi}_i \cdot \boldsymbol{\xi}_j)^{\zeta}</math> measures similarity of environments.
* Fitting coefficients: <math display="inline">\alpha_j</math> result from Gaussian Process Regression.
* Prediction: <math display="inline">E^{\rm mb}_i(\boldsymbol{\xi}_i) = E^{\rm mb}_0 + \delta^2\sum_j \alpha_j k_{ij}</math>
* For specifics of SOAP turbo, refer to Miguel's paper ([[#citeproc_bib_item_3|Caro 2019]])

[[File:images/gpr_regression_deringer.png]]

← SOAP expansion.

← Interpolation of DFT PES ([[#citeproc_bib_item_4|Deringer et al. 2021]])


== How do we model XPS? ==

* XPS (X-ray photoelectron spectroscopy) spectra measure '''core electron binding energies'''.
* Core electron binding energies ''depend on atomic environment''.
* SOAP turbo model trained on DFT (<math display="inline">\Delta\text{KS}</math>) data with a <math display="inline">GW</math> correction on top ([[#citeproc_bib_item_6|Golze et al. 2022]]).
* <math display="inline">\text{CEBE} = ( \Delta KS^0_{\rm extended} + GW_{\rm carved} - \Delta \mathrm{KS}^+_{\rm carved} )</math>
* Thermal and instrumental broadening accounted for by a Gaussian with <math display="inline">\sigma = 0.4\text{ eV}</math>
* Can analyse environments which make up spectra.

[[File:images/miguel_xps_diagram.png]]

↑ Diagram of XPS (x-axis is core electron binding energy) ([[#citeproc_bib_item_6|Golze et al. 2022]])


== Prediction of Local Properties ==

* We can predict an arbitrary number of ''local properties'' which use SOAP turbo descriptors using ''TurboGAP''.
* /One can get these for “free”/ if local property models are trained using the same descriptors as atomic energies/forces.
*; ''Hirshfeld volumes''
*: gives vdW dispersion via Tkatchenko-Scheffler. <math display="block"> E_{\rm TS} = \sum_i \sum_{j\neq i} C_{6, ij}(v_{i}(\boldsymbol{\xi}_i), v_{j}(\boldsymbol{\xi}_j))\frac{f_{\rm damp}(r_{ij}, \ldots)}{r_{ij}^6} </math>
*; ''Core-electron binding energies''
*: give '''X-ray Photoelectron Spectroscopy''' prediction. <math display="block"> g_{\rm xps}(\varepsilon) = \frac{1}{M} \sum_i \exp\left( -(\varepsilon - \varepsilon_{i}(\boldsymbol{\xi}_i))^2/(2\sigma^2) \right) </math>


== How do we marry experimental data and GAP? ==

* '''Generalized Hamiltonian within Grand-Canonical Monte-Carlo''' scheme.
* Vary oxygen content with chemical potential, to see motif differences.

<math display="block"> \textcolor{violet}{ \mathbf{\tilde{g}}_{\rm {pred}}}(\varepsilon) = \sum_{i} \exp\left(
-\frac{(\varepsilon - \textcolor{blue}{\varepsilon^{i}_{\rm
{pred}}(\boldsymbol{\xi}_i)})^2}{2\sigma^2}\right)</math>

<math display="block"> \tilde{E} = E_{\rm GAP} + E_{\rm spectra} </math>

<math display="block"> E_{\rm spectra} = \frac{1}{2}\gamma \int \,d\varepsilon \left( \textcolor{violet}{ g_{\rm pred}(\varepsilon)} - g_{\rm exp}(\varepsilon)\right)^2</math>

<ul>
<li>A simple squared difference, which is a penalty with increases with spectra dissimilarity and <math display="inline">\gamma</math> is an energy scale.</li>
<li>We can do not just XPS but also X-ray/neutron diffraction and pair distribution functions.

[[File:images/xps_gcmc_scheme.png]]
</li></ul>

* We use the standard acceptance criteria for adding/removing/moving a particle

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\left[1,
\frac{V}{\lambda^3 (N+1)} \exp\left\{ - \frac{\tilde{E}(N+1) -
\tilde{E}(N) - \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\left[1,
\frac{\lambda^3 N}{V} \exp\left\{ -\frac{\tilde{E}(N-1) -
\tilde{E}(N) + \mu}{k_BT} \right\} \right] </math>

<math display="block"> \mathrm{acc}_{\rm move} = \mathrm{min }\left[ 1, \exp\left\{ -
\frac{\tilde{E}(\mathrm{new}) - \tilde{E}(\mathrm{old})}{k_BT} \right\} \right]</math>


= Usage =


== Installing ''TurboGAP'' ==

We can install the ''TurboGAP'' code, found here https://github.com/mcaroba/turbogap through ''recursively cloning'' the main branch

<syntaxhighlight lang="bash">git clone --recursive https://github.com/mcaroba/turbogap.git
</syntaxhighlight>
Then we edit this line in the <code>Makefile</code> to match our system architecture.

<pre>include makefiles/Makefile.Ubuntu_gfortran_mpi
</pre>
Then, we make

<syntaxhighlight lang="bash">make
</syntaxhighlight>
and then export the path in <code>.bashrc</code> file

<syntaxhighlight lang="bash">export PATH="~/turbogap/bin:$PATH"
</syntaxhighlight>

== What you need for a calculation ==

; <code>gap_files/</code> directory
: The directory which contains your <code>*.gap</code>, with alphas (fitting coefficients) and sparse set descriptor files.
;* Potentials are available from ''TurboGAP'' wiki https://turbogap.fi/wiki/index.php/Potentials
;* Conversion script from <code>gap_fit .xml</code> to <code>.gap</code> can be used <code>turbogap/tools/make_gap_files.py</code>
; <code>input</code> file
: The file which tells the code what to do
; <code>*.xyz</code>
: An extended xyz format file which contains positions/lattice/velocity information.


=== Adding a local property model for CEBE prediction ===

<syntaxhighlight lang="bash"> gap_beg soap_turbo
n_species = 2
species = C O
central_species = 1
... some params ...
zeta = 4
delta = 0.1
desc_sparse = "gap_files/CO.xml.sparseX.GAP_2022_5_19_180_7_17_31_41410"
alphas_sparse = "gap_files/alphas_soap_turbo_1.dat"
compress_mode = "trivial"
has_local_properties = .true.
n_local_properties = 1
local_property_qs = 'gap_files/core_electron_be.xml.sparseX.GAP_2024_1_26_120_18_36_33_6091'
local_property_alphas = 'gap_files/alphas_core_electron_be_1.dat'
local_property_labels = 'core_electron_be'
local_property_zetas = 2
local_property_deltas = 1.0
local_property_v0s = 290.0
gap_end
</syntaxhighlight>

=== Conversion of <code>gap_fit</code> <code>*.xml</code> files to <code>*.gap</code> format ===

* Then we convert the <code>*.xml</code> files which result from <code>gap_fit</code> into the required format for ''TurboGAP''.
* This is done using the convenience script in <code>turbogap/tools/make_gap_files.py</code>

<syntaxhighlight lang="bash">python3 make_gap_files.py {gap_file.xml} {gap_file.gap} {N_local_prop.} \
{local_prop1.xml} {local_prop_label1} \
{local_prop2.xml} {local_prop_label2}...
# e.g. let's build a carbon oxygen GAP
# with core electron binding energy prediction capability.
python3 make_gap_files.py CO.xml CO.gap 1 \
core_electron_be_co.xml core_electron_be
</syntaxhighlight>
* This creates a directory <code>gap_files/</code> which has
*# <code>alphas_{descriptor}.dat</code> which are fitting coefficients for a particular descriptor
*# <code>qs_{descriptor}.dat</code> which are the '''sparse set''' descriptors, used for prediction.
*# <code>alphas_{local_property}.dat</code> and <code>qs_{local_property}.dat</code> for local property prediction.


== Running the Tutorial ==

* Navigate to <code>turbogap/tutorials/xps_optimizaton</code>
* Run <code>bash run.sh {calc_type}</code> where <code>{calc_type}</code> is either:
** <code>prediction</code>
** <code>standard_gcmc</code>
** <code>xps_optimization_gcmc</code>
* Note that these simulations have a high chemical potential and (have high <math display="inline">\gamma</math> factors), these are just to give a result in a reasonable amount of time. Take care with real simulations.


== Core electron binding energy prediction ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap predict
</syntaxhighlight>
* We have in our <code>input</code> file

<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99
e0 = -.16138053 0.
</pre>
Output:

; <code>trajectory_out.xyz</code>
: which contains atomic positions, local energy and local property arrays for each structure.

<syntaxhighlight lang="bash"> 512
Properties=species:S:1:pos:R:3:forces:R:3:local_energy:R:1:core_electron_be:R:1 \
Lattice="..." energy=-4321.582604 energy_soap=-4114.77018429 energy_2b=-124.73820332 energy_3b=0.55261463 \
energy_core_pot=0.00000000 energy_vdw=0.00000000 energy_exp=0.00000000 energy_xps=0.00000000 \
virial="..." stress="..." volume=5372.640346
C 4.39625950 7.79414272 11.91703476 \
-0.00176074 -0.00403026 0.00040104 \
-8.32055957 \ # Local energy prediction
283.83841418 # CEBE Prediction
</syntaxhighlight>

* Withiout specifying Exp. data, Turbogap does not predict the spectra.
* XPS spectra can simply be calculated from <code>core_electron_be</code> in <code>trajectory_out.xyz</code>.


== (Grand-Canonical) Monte-Carlo ==

* We use <code>mc</code> mode

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc
</syntaxhighlight>
* ''e.g'' For a <math display="inline">\mu V T</math> simulation of system in oxygen environment

<pre class="conf">! Monte-carlo options
mc_nsteps = 1000 # Number of Monte-Carlo Steps
n_mc_types = 3 # Number of MC move types
mc_types = 'move' 'insertion' 'removal' # MC move types (move/insertion/removal/volume/swap/md)
mc_move_max = 0.5 # Maximum distance for MC displacement ("move") type moves

! - GCMC Options
n_mc_mu = 1 # Number of chemical potentials to add
mc_mu = 0 # Chemical potential(s) [eV]
mc_species = 'O' # GCMC species types
mc_min_dist = 0.1 # GCMC minimum insertion distance
</pre>
Output:

; <code>mc.log</code>
:

<syntaxhighlight lang="bash"># mc_istep mc_move accepted E_trial E_current E_exp_trial E_exp_current N_tot_trial N_mc_species_trial
1 move F -4321.10376531 -4321.58260476 272.97693282 272.83692252 512 O 0
2 move F -4321.10017027 -4321.58260476 272.68730901 272.83692252 512 O 0
3 insertion F -4319.14082561 -4321.58260476 269.49191203 272.83692252 513 O 1
</syntaxhighlight>

; <code>mc_all.xyz</code>
: xyz file written to every <code>write_xyz = N</code> steps.
; <code>mc_current.xyz</code>
: xyz file saving current accepted config written to every <code>write_xyz = N</code> steps.

* If we run <code>bash run.sh standard_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:images/standard_gcmc.png]]


== Experimental Observable Optimization ==

<syntaxhighlight lang="bash">[mpirun -np N] turbogap mc/md
</syntaxhighlight>
* We can specify experimental observables to predict and optimize them using MC or MD. <math display="block"> E_{\rm spectra} = \frac{1}{2} \gamma \int \,d\varepsilon (g_{\rm pred}(\varepsilon) - g_{\rm exp}(\varepsilon))^2 </math>

<pre class="conf">! Experimental data options
n_exp = 1 # Number of experimental observables
exp_labels = 'xps' # Experimental observable types (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_interp.dat' # Experimental data files (Note: the range of resulting XPS prediction will be the same as the experimental range.)
exp_n_samples = 501 # Number of interpolations samples over the experimental range.
exp_energy_scales = 100.0 # Energy scale (gamma) [eV]

# XPS smearing
xps_sigma = 0.4
</pre>
Output:

; <code>xps_prediction.dat</code>
: Data file written every <code>write_xyz = N</code> steps. Separate predictions separated by blank lines.
; <code>xps_exp.dat</code>
: The experimental data fed into ''TurboGAP''.

* If we run <code>bash run.sh xps_optimization_gcmc</code> (may take a few minutes), we can see a prediction of the XPS with oxygen content

[[File:images/xps_optimization_gcmc.png]]


== Multiple Observables ==

* This is not in the tutorial files, but ''TurboGAP'' allows for we can optimization multiple observables.
** X-ray diffraction
** Neutron diffraction
** Pair distribution function
** XPS
* Here we show a fictitious example of XPS and XRD optimization.

<pre class="conf"># Partial PDFs needed for XRD calculation
pair_distribution_n_samples = 301
r_range_min = 0.1
r_range_max = 13.5
pair_distribution_rcut = 14.1

do_xrd = .true.

# Experimental Data Specification
n_exp = 2 # Number of Exp. observables
exp_labels = 'xps' 'xrd' # Exp. observable type (xps/xrd/nd/pdf)
exp_data_files = 'xps_spectra_experiment.dat' 'xrd_CO_experiment.dat'
exp_n_samples = 301 301 # Number of samples for the experimental data
exp_energy_scales = 10.0 10.0 # The "gamma" term for each observable

# If using monte-carlo, the experimental energies are added directly to the local energy
</pre>

= Applications =


== Experimental Observable prediction/optimization ==


=== Deconvolution analysis ===

* Use subgraph isomorphisms of bonding networks to discern motifs present.

[[File:images/exp_xps_with_background_high.png]]

↑ Experimental XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:images/deconvolution_comparison_fill_-3_new.png]]

↑ Our XPS deconvolution ([[#citeproc_bib_item_9|Santini et al. 2015]])

[[File:images/xps_shifts_miguel_first.png]]

[[File:images/xps_shifts_miguel_last.png]]


=== Reduced graphene oxide - Exp. prediction ===

([[#citeproc_bib_item_5|El-Machachi et al. 2024]])


=== Grand-Canonical Monte-Carlo - Exp. optimization ===


=== Molecular Augmented Dynamics ('''MAD''') - MD with ''Experimental Forces'' ===


= Summary =

* Machine-learned potentials can be combined with experimental models to give structures which are consistent with both ''ab-inito'' and experimental data.
* We can generate oxygenated amorphous carbon structures using a CO GAP with XPS prediction using modified GCMC and a “Molecular Augmented Dynamics” method.
* We can use ''multiple experimental observables''.
* '''Finding structures which match experimental data allows us to understand specific experimental results.'''
* Deconvolve experimental XPS exactly.
* All of this is implemented in the ''TurboGAP'' code.

[[File:images/Turbogap_logo.png]]


= Acknowlegements =

* Miguel Caro for letting me hijack his code.
* Albert P. Bartók for helping with local property prediction.
* The DAS group for stimulating discussion and support.


= References =

Bartók, Albert P., Risi Kondor, and Gábor Csányi. 2013. “On Representing Chemical Environments.” ''Physical Review B'' 87 (18): 184115. https://doi.org/10.1103/PhysRevB.87.184115.

Bartók, Albert P., Mike C. Payne, Risi Kondor, and Gábor Csányi. 2010. “Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons.” ''Physical Review Letters'' 104 (13): 136403. https://doi.org/10.1103/PhysRevLett.104.136403.

Caro, Miguel A. 2019. “Optimizing Many-Body Atomic Descriptors for Enhanced Computational Performance of Machine Learning Based Interatomic Potentials.” ''Physical Review B'' 100 (2): 024112. https://doi.org/10.1103/PhysRevB.100.024112.

Deringer, Volker L., Albert P. Bartók, Noam Bernstein, David M. Wilkins, Michele Ceriotti, and Gábor Csányi. 2021. “Gaussian Process Regression for Materials and Molecules.” ''Chemical Reviews'' 121 (16): 10073–141. https://doi.org/10.1021/acs.chemrev.1c00022.

El-Machachi, Zakariya, Damyan Frantzov, A. Nijamudheen, Tigany Zarrouk, Miguel A. Caro, and Volker L. Deringer. 2024. “Accelerated First-Principles Exploration of Structure and Reactivity in Graphene Oxide.” arXiv. https://doi.org/10.48550/arXiv.2405.14814.

Golze, Dorothea, Markus Hirvensalo, Patricia Hernández-León, Anja Aarva, Jarkko Etula, Toma Susi, Patrick Rinke, Tomi Laurila, and Miguel A. Caro. 2022. “Accurate Computational Prediction of Core-Electron Binding Energies in Carbon-Based Materials: A Machine-Learning Model Combining Density-Functional Theory and GW.” ''Chemistry of Materials'' 34 (14): 6240–54. https://doi.org/10.1021/acs.chemmater.1c04279.

Muhli, Heikki, Xi Chen, Albert P. Bartók, Patricia Hernández-León, Gábor Csányi, Tapio Ala-Nissila, and Miguel A. Caro. 2021. “Machine Learning Force Fields Based on Local Parametrization of Dispersion Interactions: Application to the Phase Diagram of \<math display="inline">\mathrm\\\C\\\\\_\60\\</math>.” ''Physical Review B'' 104 (5): 054106. https://doi.org/10.1103/PhysRevB.104.054106.

Opletal, George, Timothy C. Petersen, Amanda S. Barnard, and Salvy P. Russo. 2017. “On Reverse Monte Carlo Constraints and Model Reproduction.” ''Journal of Computational Chemistry'' 38 (17): 1547–51. https://doi.org/10.1002/jcc.24799.

Santini, Claudia A., Abu Sebastian, Chiara Marchiori, Vara Prasad Jonnalagadda, Laurent Dellmann, Wabe W. Koelmans, Marta D. Rossell, Christophe P. Rossel, and Evangelos Eleftheriou. 2015. “Oxygenated Amorphous Carbon for Resistive Memory Applications.” ''Nature Communications'' 6 (1): 8600. https://doi.org/10.1038/ncomms9600.

Wang, Yanzhou, Zheyong Fan, Ping Qian, Tapio Ala-Nissila, and Miguel A. Caro. 2022. “Structure and Pore Size Distribution in Nanoporous Carbon.” ''Chemistry of Materials'' 34 (2): 617–28. https://doi.org/10.1021/acs.chemmater.1c03279.

Zarrouk, Tigany, Rina Ibragimova, Albert P. Bartók, and Miguel A. Caro. 2024. “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon.” ''Journal of the American Chemical Society'' 146 (21): 14645–59. https://doi.org/10.1021/jacs.4c01897.

Tutorials

2024-06-09T17:11:52Z

Tigany Zarrouk:

=== Molecular dynamics ===

# Running a [[simple molecular dynamics]] simulation with '''TurboGAP''': graphitization of carbon
# [[Graphitization simulation with van der Waals corrections]]
# [[Generating amorphous silicon from quenching simulations]]


=== Geometry optimization ===

# [[Simulating icosahedral gold clusters]]

=== Analysis and visualization ===

# [[Energetic and structural analysis of a database of PtAu nanoclusters]]

=== Grand-Canonical Monte-Carlo ===

# [[Creating oxygenated amorphous carbon]]
# [[XPS-optimized oxygenated amorphous carbon]]

Monte-Carlo

2023-11-14T16:16:51Z

Tigany Zarrouk: /* Monte-Carlo options */

= Running Monte-Carlo =
To perform a (Grand-Canonical) Monte-Carlo simulation we must run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">turbogap mc
</syntaxhighlight>
The following files are written to every <code>write_xyz = N</code> steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

= Monte-Carlo Move Types =
We can specify a number of Monte-Carlo trial move types, including doing molecular dynamics (hybrid Monte-Carlo) as trial moves. These include
* Displacement
* Swap (to swap atoms in the simulation box)
* Volume (for NPT)
* Insertion (for mu VT)
* Removal (for mu VT)
* MD (for hybrid Monte-Carlo)

= Minimal Input File =
<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99

mc_nsteps = 5000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion

! Note: The following files are written to every write_xyz steps
! mc.log: self-explanatory,
! format: mc_step mc_move accepted E_trial E_current N_sites(trial)
! N_gcmc_species(trial)
! mc_all.xyz: an appended file which contains all accepted moves
! mc_trial.xyz: a single configuration which contains the trial move
! mc_current.xyz: a single configuration which contains the current accepted

write_xyz = 200
</pre>


= Monte-Carlo options =

{| class="wikitable"
|-
! Keyword
! Definition
! Optional
! Type
! Default
! Used when
! Example
|-
| <code>mc_nsteps</code>
| Number of MC steps
| N
| Int
| 0
| turbogap mc
| <code>mc_nsteps = 1000</code>
|-
| <code>n_mc_types</code>
| Number of MC types
| N
| Int
| 0
| turbogap mc
| <code>n_mc_types = 2</code>
|-
| <code>mc_types</code>
| Types of MC trials
| N
| Str(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_types = 'volume' 'move'</code>
|-
| <code>mc_acceptance</code>
| Ratios of MC trial moves
| N
| Int(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_acceptance = 2 1</code>
|-
| <code>n_mc_swaps</code>
| Number of swaps
| Y
| Int
| 0
| 'swap' in <code>mc_types</code>
| <code>n_mc_swaps = 2</code>
|-
| <code>mc_swaps</code>
| Swap species
| Y
| Strs
| None
| 'swap' in <code>mc_types</code>
| <code>mc_swaps = 'C' 'O' 'N' 'C'</code>
|-
| <code>mc_move_max</code>
| Maximum displacement [A]
| Y
| Float
| 1.0
| 'move' in <code>mc_types</code>
| <code>mc_move_max = 0.5</code>
|-
| <code>mc_lnvol_max</code>
| Log volume max for volume moves.
| Y
| Float
| 0.01
| 'volume' in <code>mc_types</code>
| <code>mc_lnvol_max = 0.02</code>
|-
| <code>mc_min_dist</code>
| Minimum distance for insertion [A]
| Y
| Float
| 0.2
| 'insertion in <code>mc_types</code>
| <code>mc_min_dist = 0.1</code>
|-
| <code>n_mc_mu</code>
| Number of chemical potentials/gcmc species
| Y
| Int
| 1
| 'insertion'/'removal' in <code>mc_types</code>
| <code>n_mc_mu = 2</code>
|-
| <code>mc_mu</code>
| Chemical potential(s) [eV]
| Y
| Float
| 0.0
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_mu = -5.16 -2.25</code>
|-
| <code>mc_species</code>
| GCMC species
| Y
| Str
| None
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_species = 'O' 'H'</code>
|-
| <code>mc_relax</code>
| Relax MC trials prior to acc. evaluation
| Y
| Bool
| .false.
| turbogap mc
| <code>mc_relax = .true.</code>
|-
| <code>n_mc_relax_after</code>
| Number of specific trials to relax
| Y
| Int(s)
| None
| <code>mc_relax = .true.</code>
| <code>n_mc_relax_after = 1</code>
|-
| <code>mc_relax_after</code>
| Relax specific MC trial types
| Y
| Str(s)
| None
| <code>n_mc_relax_after > 0</code>
| <code>mc_relax_after = 'volume'</code>
|-
| <code>mc_nrelax</code>
| Number of relaxation steps after trial
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>mc_nrelax = 50</code>
|-
| <code>mc_relax_opt</code>
| Optimisation for relaxing after steps
| Y
| Str
| 'gd'
| <code>mc_relax = .true.</code>
| <code>mc_relax_opt = 'gd-box-ortho'</code>
|-
| <code>mc_hamiltonian</code>
| Use NVE for 'md' trials
| Y
| Bool
| .false.
| 'md' in <code>mc_types</code>
| <code>mc_hamiltonian = .true.</code>
|-
| <code>mc_hybrid_opt</code>
| Optimisation for 'md' steps
| Y
| Str
| 'vv'
| 'md' in <code>mc_types</code>
| <code>mc_hybrid_opt = 'vv'</code>
|-
| <code>mc_write_xyz</code>
| Debug: read and write to files every step
| Y
| Bool
| .false.
| <code>turbogap mc</code>
| <code>mc_write_xyz = .true.</code>
|}

Monte-Carlo

2023-11-14T16:12:41Z

Tigany Zarrouk: /* Minimal Input File */

= Running Monte-Carlo =
To perform a (Grand-Canonical) Monte-Carlo simulation we must run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">turbogap mc
</syntaxhighlight>
The following files are written to every <code>write_xyz = N</code> steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

= Monte-Carlo Move Types =
We can specify a number of Monte-Carlo trial move types, including doing molecular dynamics (hybrid Monte-Carlo) as trial moves. These include
* Displacement
* Swap (to swap atoms in the simulation box)
* Volume (for NPT)
* Insertion (for mu VT)
* Removal (for mu VT)
* MD (for hybrid Monte-Carlo)

= Minimal Input File =
<pre class="conf">! Species-specific info
atoms_file = 'atoms.xyz'
pot_file = 'gap_files/CO.gap'
n_species = 2
species = C O
masses = 12.01 15.99

mc_nsteps = 5000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

n_mc_mu = 1
mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion

! Note: The following files are written to every write_xyz steps
! mc.log: self-explanatory,
! format: mc_step mc_move accepted E_trial E_current N_sites(trial)
! N_gcmc_species(trial)
! mc_all.xyz: an appended file which contains all accepted moves
! mc_trial.xyz: a single configuration which contains the trial move
! mc_current.xyz: a single configuration which contains the current accepted

write_xyz = 200
</pre>

= Monte-Carlo options =

{| class="wikitable"
|-
! Keyword
! Definition
! Optional
! Type
! Default
! Used when
! Example
|-
| <code>mc_nsteps</code>
| Number of MC steps
| N
| Int
| 0
| turbogap mc
| <code>mc_nsteps = 1000</code>
|-
| <code>n_mc_types</code>
| Number of MC types
| N
| Int
| 0
| turbogap mc
| <code>n_mc_types = 2</code>
|-
| <code>mc_types</code>
| Types of MC trials
| N
| Str(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_types = 'volume' 'move'</code>
|-
| <code>mc_acceptance</code>
| Ratios of MC trial moves
| N
| Int(s)
| None
| <code>n_mc_types</code> > 0
| <code>mc_acceptance = 2 1</code>
|-
| <code>n_mc_swaps</code>
| Number of swaps
| Y
| Int
| 0
| 'swap' in <code>mc_types</code>
| <code>n_mc_swaps = 2</code>
|-
| <code>mc_swaps</code>
| Swap species
| Y
| Strs
| None
| 'swap' in <code>mc_types</code>
| <code>mc_swaps = 'C' 'O' 'N' 'C'</code>
|-
| <code>mc_move_max</code>
| Maximum displacement [A]
| Y
| Float
| 1.0
| 'move' in <code>mc_types</code>
| <code>mc_move_max = 0.5</code>
|-
| <code>mc_lnvol_max</code>
| Log volume max for volume moves.
| Y
| Float
| 0.01
| 'volume' in <code>mc_types</code>
| <code>mc_lnvol_max = 0.02</code>
|-
| <code>mc_min_dist</code>
| Minimum distance for insertion [A]
| Y
| Float
| 0.2
| 'insertion in <code>mc_types</code>
| <code>mc_min_dist = 0.1</code>
|-
| <code>mc_mu</code>
| Chemical potential [eV]
| Y
| Float
| 0.0
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_mu = -5.16</code>
|-
| <code>mc_species</code>
| GCMC species
| Y
| Str
| None
| 'insertion'/'removal' in <code>mc_types</code>
| <code>mc_species = 'O'</code>
|-
| <code>mc_relax</code>
| Relax MC trials prior to acc. evaluation
| Y
| Bool
| .false.
| turbogap mc
| <code>mc_relax = .true.</code>
|-
| <code>n_mc_relax_after</code>
| Number of specific trials to relax
| Y
| Int(s)
| None
| <code>mc_relax = .true.</code>
| <code>n_mc_relax_after = 1</code>
|-
| <code>mc_relax_after</code>
| Relax specific MC trial types
| Y
| Str(s)
| None
| <code>n_mc_relax_after > 0</code>
| <code>mc_relax_after = 'volume'</code>
|-
| <code>mc_nrelax</code>
| Number of relaxation steps after trial
| Y
| Int
| 0
| <code>mc_relax = .true.</code>
| <code>mc_nrelax = 50</code>
|-
| <code>mc_relax_opt</code>
| Optimisation for relaxing after steps
| Y
| Str
| 'gd'
| <code>mc_relax = .true.</code>
| <code>mc_relax_opt = 'gd-box-ortho'</code>
|-
| <code>mc_hamiltonian</code>
| Use NVE for 'md' trials
| Y
| Bool
| .false.
| 'md' in <code>mc_types</code>
| <code>mc_hamiltonian = .true.</code>
|-
| <code>mc_hybrid_opt</code>
| Optimisation for 'md' steps
| Y
| Str
| 'vv'
| 'md' in <code>mc_types</code>
| <code>mc_hybrid_opt = 'vv'</code>
|-
| <code>mc_write_xyz</code>
| Debug: read and write to files every step
| Y
| Bool
| .false.
| <code>turbogap mc</code>
| <code>mc_write_xyz = .true.</code>
|}

Creating oxygenated amorphous carbon

2023-11-06T14:06:05Z

Tigany Zarrouk:

This tutorial will focus on using molecular dynamics and Grand-Canonical Monte-Carlo (GCMC) simulations to determine equilibrium structures of oxygenated amorphous carbon using '''TurboGAP'''.

This tutorial is found on the '''TurboGAP''' wiki: turbogap.fi (-> tutorials -> Creating oxygenated amorphous carbon) https://turbogap.fi/wiki/index.php/Creating_oxygenated_amorphous_carbon.

The structure of this tutorial as as follows:

# Create an amorphous carbon structure using molecular dynamics, via a melt-quench procedure.
# Perform a standard GCMC calculation to populate the structure with oxygen.
# Perform a hybrid Monte-Carlo/MD simulations, to relax the system.


= Introduction =


== What is '''TurboGAP'''? ==

TurboGAP is a code used to simulate Machine-Learned Potentials, specifically, Gaussian Approximation Potentials.

It has numerous selling points:

# It is ''fast''.
#* It uses '''soap turbo''' descriptors, which are both faster and more accurate than your typical SOAP expansion (also found in QUIP). See the original paper paper by Miguel Caro for more details: [https://doi.org/10.1103/PhysRevB.100.024112 Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials]
#* MPI parallelised, (overlapping domain decomposition currently being developed with help of CSC).
#* GPU implementation in progress (with CSC support too).
# It can perform not just typical molecular statics (with/without box relaxation) and dynamics (<math display="inline">NVT</math> / <math display="inline">NPT</math>), it can perform ''Grand-Canonical Monte-Carlo'' simulations.
#* Grand-Canonical Monte Carlo (<math display="inline">\mu VT</math>), with (<math display="inline">NVT</math>) / (<math display="inline">NPT</math>) move types available.
#* Adaptive time-scale MD (by Uttiyoarnab Saha).
# Prediction of an arbitrary number of local properties.
#* ML Van der Waals (by prediction of local hirshfeld volumes) using Tkatchenko-Scheffler [https://doi.org/10.1103/PhysRevB.104.054106 Machine learning force fields based on local parametrization of dispersion interactions]
#* Heikki Muhli has developed ''Many-Body Dispersion'' capability, with multiple optimisations.
#* Max Veit is developing ''electrostatics''.
# Sneak Peek!: we have added the capability to predict/simulate numerous types of experimental data (ML XPS/XRD) and can allow them to influence simulation. (Talk to Tigany Zarrouk/look out for the papers when they come out on arXiv)!


== Installing TurboGAP ==


=== For the MLIP workshop ===

If you have a CSC account and can ssh into Mahti/Puhti there is no need to install anything. TurboGAP is installed in the path (on Mahti / Puhti)

<syntaxhighlight lang="bash">/projappl/project_2008666/turbogap
</syntaxhighlight>
The tutorial is in

<syntaxhighlight lang="bash">cd /projappl/project_2008666/turbogap/tutorials/creating_a-COx
</syntaxhighlight>
Copy the directory <code>creating_a-COx</code> to wherever you want to do the simulations and then check the project in <code>creating_a-COx/sample_submit_script.sh</code>, change it from

<syntaxhighlight lang="bash">#SBATCH --account=project_
</syntaxhighlight>
to

<syntaxhighlight lang="bash">#SBATCH --account=project_2008666
</syntaxhighlight>
This tutorial depends on <code>ase</code>, so install it as so (after loading the python module)

<syntaxhighlight lang="bash">module load python-data/3.9-22.04
pip install ase --user
</syntaxhighlight>
Each of the simulations should take ~5-8 minutes.


=== Installation ===

To install TurboGAP please run

<syntaxhighlight lang="bash">git clone --recursive http://github.com/mcaroba/turbogap.git /your/turbogap/source/directory
</syntaxhighlight>
Where /your/turbogap/source/directory is the directory where you're putting the TurboGAP source code. To build the TurboGAP binary and library, you need to select the options that best match your architecture, by editing this line in the Makefile with one of the names of the corresponding makefiles in turbogap/makefiles:

<syntaxhighlight lang="bash">include makefiles/Makefile.Ubuntu_gfortran_mpi
</syntaxhighlight>
Then just run <code>make</code>

<syntaxhighlight lang="bash">make
</syntaxhighlight>
Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to wherever you want to do the simulations.


=== Running this tutorial on a cluster ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Change <code>sample_submit_script.sh</code> to reflect the type of job scheduler you use (here, it's slurm), the modules you've loaded for <code>turbogap</code> and python, and change the <code>PATH</code> environment variable to where you've installed <code>turbogap/bin</code>.
* Make sure the project account is correct!
* Load the python module you will use, and make sure <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Change the <code>srun turbogap</code> commands in the <code>script_*.sh</code> files to the standard for your cluster (e.g. <code>mpirun -np $N turbogap</code>).
* In each of the directories enumerated with "1.,2., etc", run the scripts in order after the preceding job has finished. They are enumerated with "1.,2., etc" with bash, e.g. <code>bash 1.run_randomise.sh</code>.


=== Running this tutorial locally ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Make sure the python package <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Edit the convenience script <code>change_to_local.sh</code> and run with bash.
* Change the path environment variable in <code>sample_submit_script.sh</code>.


== Note: how to make a potential work in '''TurboGAP''' ==

You must convert potentials which are trained from [https://github.com/libatoms/QUIP libAtoms] (the xml files) to <code>*.gap</code> files. It can be run by

<syntaxhighlight lang="bash">python3 /path/turbogap/tools/quip_to_xml/make_gap_files.py your_potential.xml your_potential.gap {your_hirshfeld.xml}
</syntaxhighlight>

= Tutorial =


== Create Amorphous Carbon ==

Here, we perform molecular dynamics simulations to form amorphous carbon from diamond. To do this, we use a simple melt-quench procedure, which is modified from the paper of Wang ''et al.'' to create amorphous carbon [https://doi.org/10.1021/acs.chemmater.1c03279 Structure and Pore Size Distribution in Nanoporous Carbon].

This is also similar to what is done in other tutorials [https://turbogap.fi/wiki/index.php/Graphitization_simulation_with_van_der_Waals_corrections Graphitization simulation with van der Waals corrections] [https://turbogap.fi/wiki/index.php/Generating_amorphous_silicon_from_quenching_simulations Generating amorphous silicon from quenching simulations].

The procedure we will follow is:

# We heat up the diamond to 9000K, thereby randomizing the structure.
# We quench to 1000K (actual temp used is 3500K in the real paper).
# We anneal (partially graphitize) the structure at 1000K, to allow the carbon bonds a chance to reform.
# Relax the structure, allowing both atomic positions and cell vectors to relax.

To run these calculations, do

<syntaxhighlight lang="bash">cd 1.make_amorphous_carbon
bash 1.run_randomise.sh
# Wait for it to finish

bash 2.run_quench.sh
# Wait for it to finish

bash 3.run_anneal.sh
# Wait for it to finish

bash 4.run_relax.sh
# Wait for it to finish
</syntaxhighlight>

=== 1. Randomise ===

First, we create a diamond structure using ASE, changing the volume to achieve a given density.

<syntaxhighlight lang="bash">sim_name="md_diamond_randomise"

input_atoms="atoms.xyz"
output_atoms="atoms_randomise.xyz"

ln -sf ../gap_files ./

# 1. Create diamond structure (1000 atoms in atoms.xyz file)
echo "> Running: python create_diamond.py"
python3 create_diamond.py

cp diamond.xyz $input_atoms
</syntaxhighlight>
Note: In <code>create_diamond.py</code> we make 1000 atoms. You can change this to a smaller number, if you want things to run faster.

<syntaxhighlight lang="python">atoms *= (3,3,3)
</syntaxhighlight>
Then we create the input file for '''TurboGAP''' in <code>input</code>.

<pre class="conf">! Species-specific info
atoms_file = '${input_atoms}' ! Input file
pot_file = 'gap_files/CO.gap' ! path to gap_files
n_species = 2 ! > Actually the number of species in atoms.xyz is 1 (C),
! but we will add oxygen in future simulations
species = C O
masses = 12.01 15.99

! MD options
md_nsteps = 5000 ! Number of MD steps (5ps randomise - actual time in paper is 20ps)
md_step = 1 ! MD timestep [fs]
thermostat = bussi ! Either bussi / berendsen

t_beg = 9000 ! Initial temperature [K]
t_end = 9000 ! Final temperature [K]
tau_t = 100. ! Time constant [fs]

! Output
write_thermo = 1 ! Write thermodynamic information every step
! (Step, Time, Temp, Kin_E, Pot_E, Pres)

write_xyz = 200 ! Write extended xyz trajectory (trajectory_out.xyz) every 200 steps

! > Predicted local properties are in the xyz, such as the local energy
! and if specified, hirshfeld volumes, core electron binding energies etc
</pre>
We run MD/relaxation simulations using the <code>md</code> option

<syntaxhighlight lang="bash">srun turbogap md
</syntaxhighlight>
This simulation outputs a few files:

# <code>trajectory_out.xyz</code> is the extended xyz of the trajectory, written every writexyz steps.
# <code>thermo.log</code> logs the MD simulation


=== 2. Quench ===

Here, we cool the system.

The only things necessary to change in the input file (if you rebel against using the provided scripts) is the temperature and the number of MD steps.

<pre class="conf">! MD options
md_nsteps = 5500 ! 5.5ps quench
md_step = 1.
thermostat = bussi
t_beg = 9000 ! Quenching from 9000K
t_end = 1000 ! to 1000K
</pre>

=== 3. Anneal ===

We can anneal the structure to graphitize. The method demonstrated here is different from that of the paper (we are hardly graphitising at 1000K and 10ps), but its here to illustrate the use of a barostat.

The necessary changes to the input file are

<pre class="conf">! MD options
md_nsteps = 10000 ! Graphitization (actual time in the paper is 200ps)
md_step = 1.
barostat = berendsen ! Using barostat
t_beg = 1000
t_end = 1000
p_beg = 1.0 ! Initial pressure [bar]
p_end = 1.0 ! Final pressure [bar]
tau_t = 100.
</pre>

=== 4. Relax ===

Now we relax the structure. We can choose multiple options for this, but here we opt for relaxing the box and the positions.

<pre class="conf">! MD options
optimize = gd ! optimize option allows us to specify the type of relaxation
! > Can use "gd" (gradient descent)
! "gd-box-ortho" (gradient descent relaxing diagonal cell vector components)
! "gd-box" (gradient-decent relaxing all cell vector components)

! e_tol = 1.d-6 ! Default energy/force tolerances used
! f_tol = 0.010
md_nsteps = 2000
</pre>
We can now look at the simulation results by running

<syntaxhighlight lang="bash">cat traj_* > all_traj.xyz
</syntaxhighlight>
and using your atom viewer of choice.

At the end, you should have a structure which is similar to this.

[[File:amorphous_carbon.png]]

You can look at the local energies predicted by the GAP (here I use <code>ovito</code>).

* Open structure in <code>ovito</code>
* Use modifier "Create Bonds"
* Use modifier "Color Coding"
** Change "Input Property" to "Potential Energy" (it might also be called "local energy").

[[File:amorphous_carbon_local_energy.png]]


== Perform Standard Grand-Canonical Monte-Carlo (GCMC) ==


=== Theory of GCMC ===

In a GCMC simulation, a system of interest is at fixed volume, allowed to thermalize by contact with a heat bath, and it can exchange particles with an infinite reservoir, forming a constant (<math display="inline">\mu,V,T</math>) ensemble.

We perform GCMC using a Markov Chain: starting from an initial pure a-C<math display="inline">_x</math> structure, we generate trial configurations by either randomly displacing a particular atom, or inserting/removing oxygen into/from a random position respectively. These trial configurations are either accepted or rejected using the standard acceptance criteria (see Frenkel 2002) for particle displacement/insertion/removal

<math display="block"> \mathrm{acc}( \mathrm{move}) = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(\mathrm{trial}) - E(\mathrm{current}) ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\biggl[ 1, \frac{V}{\lambda^3 (N+1)} \exp\left\{ - \beta ( E(N+1) - E(N) - \mu ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\biggl[1, \frac{\lambda^3 N}{V} \exp\left\{ -\beta ( E(N-1) - E(N) + \mu ) \right\} \biggr] </math>

where <math display="inline">\lambda</math> is the thermal de-Broglie wavelength which is given by <math display="inline">\lambda = \sqrt{\frac{2\pi \hslash^2}{mk_BT}}</math>. We then repeat the procedure with the last accepted configuration until the the maximum number of iterations has been reached.


=== GCMC in '''TurboGAP''' ===

Now we perform Grand-Canonical Monte-Carlo to obtain an oxygenated amorphous carbon structure. This is a <math display="inline">\mu VT</math> ensemble, hence we must specify the chemical potential, <math display="inline">\mu</math>, and the species which we want to insert: here, just oxygen.

The format is similar to above, but there are more options:

<pre class="conf">mc_nsteps = 10000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion
</pre>
To see all the options for Monte-Carlo, visit [https://turbogap.fi/wiki/index.php/Monte-Carlo Monte-Carlo].

To run it, we run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">srun turbogap mc
</syntaxhighlight>
To run these calculations, do

<syntaxhighlight lang="bash">cd ../2.standard_gcmc
bash 1.run_gcmc.sh
</syntaxhighlight>
The following files are written to every writexyz steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

You should find a structure which looks like this (the last configuration in <code>mc_all.xyz</code>).

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc.png]]

</div>
Here, we use a chemical potential of <math display="inline">\mu = 0.0</math> eV. You can experiment with different chemical potentials. Using <math display="inline">\mu =
-5.16</math> eV is related to half the binding energy of O2 at 300K and 1atm. Try it for yourself (in your own time)!

For this short simulation and this size of box, we will not reach convergence of the oxygen content (MC simulation steps are on the order of 105 - 106 steps). We can run

<syntaxhighlight lang="bash">module load python-data/3.9
python analyse_O_content.py
</syntaxhighlight>
to see the oxygen content.

<div class="figure">

[[File:simple_O_content_monitor.png]]

</div>
Here, we can see the local energy for the GCMC configuration.

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc_local_energy.png]]

</div>

== Hamiltonian Monte-Carlo ==

Observing the local energies above, we notice that there are some rather high values. To relax them using MC, we can use Hamiltonian MC, which uses the results of NVE molecular dynamics as trial configurations for MC displacements. This gives very high acceptance rates in comparison to other move types due to the energy being approximately conserved from the symplectic (velocity verlet) integrator (hence your acceptance criterion will be approximately 1).

<pre class="conf">md_nsteps = 20 ! specifying the number of steps for velocity verlet
md_step = 0.1 ! 0.1fs timestep

mc_nsteps = 50 ! Number of mc trial steps to be performed
n_mc_types = 1
mc_types = 'md'
mc_acceptance = 1 ! Relative rate of choosing the respective trial moves
mc_hamiltonian = .true. ! For Hamiltonian Monte-Carlo
! (NVE ensemble used to increase trial acceptance) for md
! move type
</pre>

<syntaxhighlight lang="bash">cd ../3.hamiltonian_mc
bash 1.run_hamiltonian_mc.sh
</syntaxhighlight>

== <math display="inline">NPT</math> using volume moves ==

We can do constant pressure Monte-Carlo by doing volume moves. In fact, we can mix volume moves and MD <math display="inline">NPT</math> moves to create trial configurations, if we so desire.

Looking at the <code>input</code> file, we see that we've specified a volume type MC move in which the acceptance criterion is given by <math display="block"> \mathrm{acc}(V \rightarrow V') = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(V') - E(V) + P(V-V') - (N+1)\ln (V/V')/\beta ) \right\} \biggr] </math>

<pre class="conf">mc_nsteps = 200

n_mc_types = 3
mc_types = 'move' 'volume' 'md' ! We now specify an MD move type for doing NPT

mc_acceptance = 1 1 1
mc_move_max = 0.2

mc_lnvol_max = 0.01 ! Maximum lnvol to modify the volume

! Specify MD configuration
t_beg = 300
t_end = 300
p_beg = 1.0
p_end = 1.0

md_nsteps = 30
md_step = 0.1 ! Reducing the timestep
barostat = berendsen
tau_t = 100.
</pre>
To run it we do

<syntaxhighlight lang="bash">cd ../4.volume_mc
bash 1.run_volume.sh
</syntaxhighlight>
This gives an expansion of the volume.

Creating oxygenated amorphous carbon

2023-11-06T10:15:55Z

Tigany Zarrouk:

This tutorial will focus on using molecular dynamics and Grand-Canonical Monte-Carlo (GCMC) simulations to determine equilibrium structures of oxygenated amorphous carbon using '''TurboGAP'''.

This tutorial is found on the '''TurboGAP''' wiki: turbogap.fi (-> tutorials -> Creating oxygenated amorphous carbon) https://turbogap.fi/wiki/index.php/Creating_oxygenated_amorphous_carbon.

The structure of this tutorial as as follows:

# Create an amorphous carbon structure using molecular dynamics, via a melt-quench procedure.
# Perform a standard GCMC calculation to populate the structure with oxygen.
# Perform a hybrid Monte-Carlo/MD simulations, to relax the system.


= Introduction =


== What is '''TurboGAP'''? ==

TurboGAP is a code used to simulate Machine-Learned Potentials, specifically, Gaussian Approximation Potentials.

It has numerous selling points:

# It is ''fast''.
#* It uses '''soap turbo''' descriptors, which are both faster and more accurate than your typical SOAP expansion (also found in QUIP). See the original paper paper by Miguel Caro for more details: [https://doi.org/10.1103/PhysRevB.100.024112 Optimizing many-body atomic descriptors for enhanced computational performance of machine learning based interatomic potentials]
#* MPI parallelised, (overlapping domain decomposition currently being developed with help of CSC).
#* GPU implementation in progress (with CSC support too).
# It can perform not just typical molecular statics (with/without box relaxation) and dynamics (<math display="inline">NVT</math> / <math display="inline">NPT</math>), it can perform ''Grand-Canonical Monte-Carlo'' simulations.
#* Grand-Canonical Monte Carlo (<math display="inline">\mu VT</math>), with (<math display="inline">NVT</math>) / (<math display="inline">NPT</math>) move types available.
#* Adaptive time-scale MD (by Uttiyoarnab Saha).
# Prediction of an arbitrary number of local properties.
#* ML Van der Waals (by prediction of local hirshfeld volumes) using Tkatchenko-Scheffler [https://doi.org/10.1103/PhysRevB.104.054106 Machine learning force fields based on local parametrization of dispersion interactions]
#* Heikki Muhli has developed ''Many-Body Dispersion'' capability, with multiple optimisations.
#* Max Veit is developing ''electrostatics''.
# Sneak Peek!: we have added the capability to predict/simulate numerous types of experimental data (ML XPS/XRD) and can allow them to influence simulation. (Talk to Tigany Zarrouk/look out for the papers when they come out on arXiv)!


== Installing TurboGAP ==


=== For the MLIP workshop ===

If you have a CSC account and can ssh into Mahti/Puhti there is no need to install anything. TurboGAP is installed in the path (on Mahti / Puhti)

<syntaxhighlight lang="bash">/projappl/project_2008666/turbogap
</syntaxhighlight>
The tutorial is in

<syntaxhighlight lang="bash">cd /projappl/project_2008666/turbogap/tutorials/creating_a-COx
</syntaxhighlight>
Copy the directory <code>creating_a-COx</code> to wherever you want to do the simulations and then check the project in <code>creating_a-COx/sample_submit_script.sh</code>, change it from

<syntaxhighlight lang="bash">#SBATCH --account=project_
</syntaxhighlight>
to

<syntaxhighlight lang="bash">#SBATCH --account=project_2008666
</syntaxhighlight>
This tutorial depends on <code>ase</code>, so install it as so (after loading the python module)

<syntaxhighlight lang="bash">module load python-data/3.9-22.04
pip install ase --user
</syntaxhighlight>
Each of the simulations should take ~5-8 minutes.


=== Installation ===

To install TurboGAP please run

<syntaxhighlight lang="bash">git clone --recursive http://github.com/mcaroba/turbogap.git /your/turbogap/source/directory
</syntaxhighlight>
Where /your/turbogap/source/directory is the directory where you're putting the TurboGAP source code. To build the TurboGAP binary and library, you need to select the options that best match your architecture, by editing this line in the Makefile with one of the names of the corresponding makefiles in turbogap/makefiles:

<syntaxhighlight lang="bash">include makefiles/Makefile.Ubuntu_gfortran_mpi
</syntaxhighlight>
Then just run <code>make</code>

<syntaxhighlight lang="bash">make
</syntaxhighlight>
Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to wherever you want to do the simulations.


=== Running this tutorial on a cluster ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Change <code>sample_submit_script.sh</code> to reflect the type of job scheduler you use (here, it's slurm), the modules you've loaded for <code>turbogap</code> and python, and change the <code>PATH</code> environment variable to where you've installed <code>turbogap/bin</code>.
* Make sure the project account is correct!
* Load the python module you will use, and make sure <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Change the <code>srun turbogap</code> commands in the <code>script_*.sh</code> files to the standard for your cluster (e.g. <code>mpirun -np $N turbogap</code>).
* In each of the directories enumerated with "1.,2., etc", run the scripts in order after the preceding job has finished. They are enumerated with "1.,2., etc" with bash, e.g. <code>bash 1.run_randomise.sh</code>.


=== Running this tutorial locally ===

* Copy the directory <code>turbogap/tutorials/creating_a-COx</code> to where you want to run.
* Make sure the python package <code>ase</code> is installed by running <code>pip install ase {--user}</code>.
* Use the edit the convenience script <code>change_to_local.sh</code> and run.
* Change the path environment variable in <code>sample_submit_script.sh</code>.


== Note: how to make a potential work in '''TurboGAP''' ==

You must convert potentials which are trained from [https://github.com/libatoms/QUIP libAtoms] (the xml files) to <code>*.gap</code> files. It can be run by

<syntaxhighlight lang="bash">python3 /path/turbogap/tools/quip_to_xml/make_gap_files.py your_potential.xml your_potential.gap {your_hirshfeld.xml}
</syntaxhighlight>

= Tutorial =


== Create Amorphous Carbon ==

Here, we perform molecular dynamics simulations to form amorphous carbon from diamond. To do this, we use a simple melt-quench procedure, which is modified from the paper of Wang ''et al.'' to create amorphous carbon [https://doi.org/10.1021/acs.chemmater.1c03279 Structure and Pore Size Distribution in Nanoporous Carbon].

This is also similar to what is done in other tutorials [https://turbogap.fi/wiki/index.php/Graphitization_simulation_with_van_der_Waals_corrections Graphitization simulation with van der Waals corrections] [https://turbogap.fi/wiki/index.php/Generating_amorphous_silicon_from_quenching_simulations Generating amorphous silicon from quenching simulations].

The procedure we will follow is:

# We heat up the diamond to 9000K, thereby randomizing the structure.
# We quench to 1000K (actual temp used is 3500K in the real paper).
# We anneal (partially graphitize) the structure at 1000K, to allow the carbon bonds a chance to reform.
# Relax the structure, allowing both atomic positions and cell vectors to relax.

To run these calculations, do

<syntaxhighlight lang="bash">cd 1.make_amorphous_carbon
bash 1.run_randomise.sh
# Wait for it to finish

bash 2.run_quench.sh
# Wait for it to finish

bash 3.run_anneal.sh
# Wait for it to finish

bash 4.run_relax.sh
# Wait for it to finish
</syntaxhighlight>

=== 1. Randomise ===

First, we create a diamond structure using ASE, changing the volume to achieve a given density.

<syntaxhighlight lang="bash">sim_name="md_diamond_randomise"

input_atoms="atoms.xyz"
output_atoms="atoms_randomise.xyz"

ln -sf ../gap_files ./

# 1. Create diamond structure (1000 atoms in atoms.xyz file)
echo "> Running: python create_diamond.py"
python3 create_diamond.py

cp diamond.xyz $input_atoms
</syntaxhighlight>
Note: In <code>create_diamond.py</code> we make 1000 atoms. You can change this to a smaller number, if you want things to run faster.

<syntaxhighlight lang="python">atoms *= (3,3,3)
</syntaxhighlight>
Then we create the input file for '''TurboGAP''' in <code>input</code>.

<pre class="conf">! Species-specific info
atoms_file = '${input_atoms}' ! Input file
pot_file = 'gap_files/CO.gap' ! path to gap_files
n_species = 2 ! > Actually the number of species in atoms.xyz is 1 (C),
! but we will add oxygen in future simulations
species = C O
masses = 12.01 15.99

! MD options
md_nsteps = 5000 ! Number of MD steps (5ps randomise - actual time in paper is 20ps)
md_step = 1 ! MD timestep [fs]
thermostat = bussi ! Either bussi / berendsen

t_beg = 9000 ! Initial temperature [K]
t_end = 9000 ! Final temperature [K]
tau_t = 100. ! Time constant [fs]

! Output
write_thermo = 1 ! Write thermodynamic information every step
! (Step, Time, Temp, Kin_E, Pot_E, Pres)

write_xyz = 200 ! Write extended xyz trajectory (trajectory_out.xyz) every 200 steps

! > Predicted local properties are in the xyz, such as the local energy
! and if specified, hirshfeld volumes, core electron binding energies etc
</pre>
We run MD/relaxation simulations using the <code>md</code> option

<syntaxhighlight lang="bash">srun turbogap md
</syntaxhighlight>
This simulation outputs a few files:

# <code>trajectory_out.xyz</code> is the extended xyz of the trajectory, written every writexyz steps.
# <code>thermo.log</code> logs the MD simulation


=== 2. Quench ===

Here, we cool the system.

The only things necessary to change in the input file (if you rebel against using the provided scripts) is the temperature and the number of MD steps.

<pre class="conf">! MD options
md_nsteps = 5500 ! 5.5ps quench
md_step = 1.
thermostat = bussi
t_beg = 9000 ! Quenching from 9000K
t_end = 1000 ! to 1000K
</pre>

=== 3. Anneal ===

We can anneal the structure to graphitize. The method demonstrated here is different from that of the paper (we are hardly graphitising at 1000K and 10ps), but its here to illustrate the use of a barostat.

The necessary changes to the input file are

<pre class="conf">! MD options
md_nsteps = 10000 ! Graphitization (actual time in the paper is 200ps)
md_step = 1.
barostat = berendsen ! Using barostat
t_beg = 1000
t_end = 1000
p_beg = 1.0 ! Initial pressure [bar]
p_end = 1.0 ! Final pressure [bar]
tau_t = 100.
</pre>

=== 4. Relax ===

Now we relax the structure. We can choose multiple options for this, but here we opt for relaxing the box and the positions.

<pre class="conf">! MD options
optimize = gd ! optimize option allows us to specify the type of relaxation
! > Can use "gd" (gradient descent)
! "gd-box-ortho" (gradient descent relaxing diagonal cell vector components)
! "gd-box" (gradient-decent relaxing all cell vector components)

! e_tol = 1.d-6 ! Default energy/force tolerances used
! f_tol = 0.010
md_nsteps = 2000
</pre>
We can now look at the simulation results by running

<syntaxhighlight lang="bash">cat traj_* > all_traj.xyz
</syntaxhighlight>
and using your atom viewer of choice.

At the end, you should have a structure which is similar to this.

[[File:amorphous_carbon.png]]

You can look at the local energies predicted by the GAP (here I use <code>ovito</code>).

* Open structure in <code>ovito</code>
* Use modifier "Create Bonds"
* Use modifier "Color Coding"
** Change "Input Property" to "Potential Energy" (it might also be called "local energy").

[[File:amorphous_carbon_local_energy.png]]


== Perform Standard Grand-Canonical Monte-Carlo (GCMC) ==


=== Theory of GCMC ===

In a GCMC simulation, a system of interest is at fixed volume, allowed to thermalize by contact with a heat bath, and it can exchange particles with an infinite reservoir, forming a constant (<math display="inline">\mu,V,T</math>) ensemble.

We perform GCMC using a Markov Chain: starting from an initial pure a-C<math display="inline">_x</math> structure, we generate trial configurations by either randomly displacing a particular atom, or inserting/removing oxygen into/from a random position respectively. These trial configurations are either accepted or rejected using the standard acceptance criteria (see Frenkel 2002) for particle displacement/insertion/removal

<math display="block"> \mathrm{acc}( \mathrm{move}) = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(\mathrm{trial}) - E(\mathrm{current}) ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N+1) = \mathrm{min}\biggl[ 1, \frac{V}{\lambda^3 (N+1)} \exp\left\{ - \beta ( E(N+1) - E(N) - \mu ) \right\} \biggr] </math>

<math display="block"> \mathrm{acc}(N \rightarrow N-1) = \mathrm{min}\biggl[1, \frac{\lambda^3 N}{V} \exp\left\{ -\beta ( E(N-1) - E(N) + \mu ) \right\} \biggr] </math>

where <math display="inline">\lambda</math> is the thermal de-Broglie wavelength which is given by <math display="inline">\lambda = \sqrt{\frac{2\pi \hslash^2}{mk_BT}}</math>. We then repeat the procedure with the last accepted configuration until the the maximum number of iterations has been reached.


=== GCMC in '''TurboGAP''' ===

Now we perform Grand-Canonical Monte-Carlo to obtain an oxygenated amorphous carbon structure. This is a <math display="inline">\mu VT</math> ensemble, hence we must specify the chemical potential, <math display="inline">\mu</math>, and the species which we want to insert: here, just oxygen.

The format is similar to above, but there are more options:

<pre class="conf">mc_nsteps = 10000 ! Number of mc trial steps to be performed

n_mc_types = 3 ! Number of mc trial types
mc_types = 'move' 'insertion' 'removal' ! MC types can be: 'insertion' 'removal' 'md' 'swap' 'move'
! 'volume'
mc_acceptance = 1 1 1 ! Ratios for choosing the respective trial moves (all equally likely here)

mc_move_max = 0.2 ! Maximum distance for the move of a particular particle

mc_mu = 0.0 ! gcmc: Chemical potential [eV], using a large one here
mc_species = 'O' ! gcmc: species to insert / remove
mc_min_dist = 0.1 ! gcmc: minimum distance between particles for insertion
</pre>
To see all the options for Monte-Carlo, visit [https://turbogap.fi/wiki/index.php/Monte-Carlo Monte-Carlo].

To run it, we run '''TurboGAP''' in <code>mc</code> mode:

<syntaxhighlight lang="bash">srun turbogap mc
</syntaxhighlight>
To run these calculations, do

<syntaxhighlight lang="bash">cd ../2.standard_gcmc
bash 1.run_gcmc.sh
</syntaxhighlight>
The following files are written to every writexyz steps

* <code>mc.log</code>: self-explanatory,
** format: <code>mc_step mc_move accepted E_trial E_current N_sites(trial) N_gcmc_species(trial)</code>
* <code>mc_all.xyz</code>: an appended file which contains all accepted moves
* <code>mc_trial.xyz</code>: a single configuration which contains the trial move
* <code>mc_current.xyz</code>: a single configuration which contains the current accepted

You should find a structure which looks like this (the last configuration in <code>mc_all.xyz</code>).

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc.png]]

</div>
Here, we use a chemical potential of <math display="inline">\mu = 0.0</math> eV. You can experiment with different chemical potentials. Using <math display="inline">\mu =
-5.16</math> eV is related to half the binding energy of O2 at 300K and 1atm. Try it for yourself (in your own time)!

For this short simulation and this size of box, we will not reach convergence of the oxygen content (MC simulation steps are on the order of 105 - 106 steps). We can run

<syntaxhighlight lang="bash">module load python-data/3.9
python analyse_O_content.py
</syntaxhighlight>
to see the oxygen content.

<div class="figure">

[[File:simple_O_content_monitor.png]]

</div>
Here, we can see the local energy for the GCMC configuration.

<div class="figure">

[[File:oxygenated_amorphous_carbon_gcmc_local_energy.png]]

</div>

== Hamiltonian Monte-Carlo ==

Observing the local energies above, we notice that there are some rather high values. To relax them using MC, we can use Hamiltonian MC, which uses the results of NVE molecular dynamics as trial configurations for MC displacements. This gives very high acceptance rates in comparison to other move types due to the energy being approximately conserved from the symplectic (velocity verlet) integrator (hence your acceptance criterion will be approximately 1).

<pre class="conf">md_nsteps = 20 ! specifying the number of steps for velocity verlet
md_step = 0.1 ! 0.1fs timestep

mc_nsteps = 50 ! Number of mc trial steps to be performed
n_mc_types = 1
mc_types = 'md'
mc_acceptance = 1 ! Relative rate of choosing the respective trial moves
mc_hamiltonian = .true. ! For Hamiltonian Monte-Carlo
! (NVE ensemble used to increase trial acceptance) for md
! move type
</pre>

<syntaxhighlight lang="bash">cd ../3.hamiltonian_mc
bash 1.run_hamiltonian_mc.sh
</syntaxhighlight>

== <math display="inline">NPT</math> using volume moves ==

We can do constant pressure Monte-Carlo by doing volume moves. In fact, we can mix volume moves and MD <math display="inline">NPT</math> moves to create trial configurations, if we so desire.

Looking at the <code>input</code> file, we see that we've specified a volume type MC move in which the acceptance criterion is given by <math display="block"> \mathrm{acc}(V \rightarrow V') = \mathrm{min}\biggl[1, \exp\left\{ -\beta ( E(V') - E(V) + P(V-V') - (N+1)\ln (V/V')/\beta ) \right\} \biggr] </math>

<pre class="conf">mc_nsteps = 200

n_mc_types = 3
mc_types = 'move' 'volume' 'md' ! We now specify an MD move type for doing NPT

mc_acceptance = 1 1 1
mc_move_max = 0.2

mc_lnvol_max = 0.01 ! Maximum lnvol to modify the volume

! Specify MD configuration
t_beg = 300
t_end = 300
p_beg = 1.0
p_end = 1.0

md_nsteps = 30
md_step = 0.1 ! Reducing the timestep
barostat = berendsen
tau_t = 100.
</pre>
To run it we do

<syntaxhighlight lang="bash">cd ../4.volume_mc
bash 1.run_volume.sh
</syntaxhighlight>
This gives an expansion of the volume.