Difference between revisions of "Gradient descent"

Latest revision as of 07:17, 31 August 2022

Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For energy minimization, we follow the direction corresponding to minus the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows:

[math]\displaystyle{ \textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n, }[/math]

until we reach the criterion for convergence (set with the e_tol keyword), [math]\displaystyle{ |E_{n+1} - E_n| \lt E_\text{tol} }[/math]. The [math]\displaystyle{ \gamma }[/math] parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. The procedure is detailed below.

Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization

In TurboGAP, the initial [math]\displaystyle{ \gamma }[/math] is chosen by specifying the maximum allowed atomic displacement in the first optimization step, max_opt_step, followed by an automated backtracking line search. During the backtracking line search, we start by (over)estimating the position update according to the user provided max_opt_step (default 0.1 Angstrom), [math]\displaystyle{ \delta x_\text{max} }[/math]:

[math]\displaystyle{ \gamma_0 = \frac{\delta x_\text{max}}{F_\text{max}}, }[/math]

where [math]\displaystyle{ F_\text{max} }[/math] is the maximum force acting on any given atom in the system. Then, we enforce the Armijo-Goldstein condition for accepting this step size:

[math]\displaystyle{ E_{n+1} \le E_n - \gamma_n c \textbf{F} \cdot \textbf{F}, }[/math]

where [math]\displaystyle{ c \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ c = 0.5 }[/math]). If the condition is not met, then we reduce [math]\displaystyle{ \gamma }[/math] at each time step:

[math]\displaystyle{ \gamma_{n+1} = \tau \gamma_n, }[/math]

with [math]\displaystyle{ \tau \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ \tau = 0.5 }[/math]), until the Armijo-Goldstein condition is met ("backtracking").

When the line search is over, we switch to using the Barzilai–Borwein method for finding [math]\displaystyle{ \gamma }[/math]:

[math]\displaystyle{ \gamma_{n+1} = \frac{|(\textbf{x}_{n+1} - \textbf{x}_n) \cdot (\textbf{F}_{n+1} - \textbf{F}_n)|}{(\textbf{F}_{n+1} - \textbf{F}_n)^2}. }[/math]

Example input file

Note: energy minimization is invoked with the turbogap md command.

optimize = "gd"    ! Do gradient descent minimization
e_tol = 1.e-6      ! Energy threshold for convergence (per atom)
f_tol = 0.01       ! Maximum force on any given atom for convergence
md_nsteps = 1000   ! Maximum number of iterations (it will break earlier if convergence to e_tol is reached)
max_opt_step = 0.1 ! Angstrom

@@ Line 1: / Line 1: @@
 Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For [[energy minimization]], we follow the direction corresponding to ''minus'' the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows:
 <math>
 \textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n,
+</math>
+until we reach the criterion for convergence (set with the <code>[[e_tol]]</code> keyword), <math>|E_{n+1} - E_n| < E_\text{tol}</math>. The <math>\gamma</math> parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. The procedure is detailed below.
+=== Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization ===
+In '''TurboGAP''', the initial <math>\gamma</math> is chosen by specifying the maximum allowed atomic displacement in the first optimization step, [[max_opt_step]], followed by an automated backtracking line search. During the backtracking line search, we start by (over)estimating the position update according to the user provided [[max_opt_step]] (default 0.1 Angstrom), <math>\delta x_\text{max}</math>:
 <math>
-until we reach the criterion for convergence (set with the <code>[[e_tol]]</code> keyword), <math>|E_{n+1} - E_n| < E_\text{tol}</math>. The <math>\gamma</math> parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. In '''TurboGAP''', the initial $\gamma$ is chosen by specifying the maximum allowed atomic displacement in the first optimization step, [[max_opt_step]], followed by an automated backtracking line search.
+\gamma_0 = \frac{\delta x_\text{max}}{F_\text{max}},
+</math>
+where <math>F_\text{max}</math> is the maximum force acting on any given atom in the system. Then, we enforce the Armijo-Goldstein condition for accepting this step size:
+<math>
+E_{n+1} \le E_n - \gamma_n c \textbf{F} \cdot \textbf{F},
+</math>
+where <math>c \in (0,1)</math> (hardcoded in '''TurboGAP''' to <math>c = 0.5</math>). If the condition is not met, then we reduce <math>\gamma</math> at each time step:
+<math>
+\gamma_{n+1} = \tau \gamma_n,
+</math>
+with <math>\tau \in (0,1)</math> (hardcoded in '''TurboGAP''' to <math>\tau = 0.5</math>), until the Armijo-Goldstein condition is met ("''backtracking''").
+When the line search is over, we switch to using the Barzilai–Borwein method for finding <math>\gamma</math>:
+<math>
+\gamma_{n+1} = \frac{|(\textbf{x}_{n+1} - \textbf{x}_n) \cdot (\textbf{F}_{n+1} - \textbf{F}_n)|}{(\textbf{F}_{n+1} - \textbf{F}_n)^2}.
+</math>
+=== Example input file ===
+Note: [[energy minimization]] is invoked with the <code>[[Documentation#turbogap md|turbogap md]]</code> command.
-'''Under construction'''
+ [[optimize]] = "gd"    ! Do gradient descent minimization
+ [[e_tol]] = 1.e-6      ! Energy threshold for convergence (per atom)
+ [[f_tol]] = 0.01       ! Maximum force on any given atom for convergence
+ [[md_nsteps]] = 1000   ! Maximum number of iterations (it will break earlier if convergence to e_tol is reached)
+ [[max_opt_step]] = 0.1 ! Angstrom

Difference between revisions of "Gradient descent"

Latest revision as of 07:17, 31 August 2022

Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization

Example input file

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools