Difference between revisions of "Gradient descent"
Miguel Caro (talk | contribs) (Created page with "Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For energy minimization,...") |
Miguel Caro (talk | contribs) |
||
| (11 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For [[energy minimization]], we follow the direction corresponding to ''minus'' the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows: | Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For [[energy minimization]], we follow the direction corresponding to ''minus'' the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows: | ||
| + | |||
<math> | <math> | ||
\textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n, | \textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n, | ||
| + | </math> | ||
| + | |||
| + | until we reach the criterion for convergence (set with the <code>[[e_tol]]</code> keyword), <math>|E_{n+1} - E_n| < E_\text{tol}</math>. The <math>\gamma</math> parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. The procedure is detailed below. | ||
| + | |||
| + | === Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization === | ||
| + | |||
| + | In '''TurboGAP''', the initial <math>\gamma</math> is chosen by specifying the maximum allowed atomic displacement in the first optimization step, [[max_opt_step]], followed by an automated backtracking line search. During the backtracking line search, we start by (over)estimating the position update according to the user provided [[max_opt_step]] (default 0.1 Angstrom), <math>\delta x_\text{max}</math>: | ||
| + | |||
<math> | <math> | ||
| − | + | \gamma_0 = \frac{\delta x_\text{max}}{F_\text{max}}, | |
| + | </math> | ||
| + | |||
| + | where <math>F_\text{max}</math> is the maximum force acting on any given atom in the system. Then, we enforce the Armijo-Goldstein condition for accepting this step size: | ||
| + | |||
| + | <math> | ||
| + | E_{n+1} \le E_n - \gamma_n c \textbf{F} \cdot \textbf{F}, | ||
| + | </math> | ||
| + | |||
| + | where <math>c \in (0,1)</math> (hardcoded in '''TurboGAP''' to <math>c = 0.5</math>). If the condition is not met, then we reduce <math>\gamma</math> at each time step: | ||
| + | |||
| + | <math> | ||
| + | \gamma_{n+1} = \tau \gamma_n, | ||
| + | </math> | ||
| + | |||
| + | with <math>\tau \in (0,1)</math> (hardcoded in '''TurboGAP''' to <math>\tau = 0.5</math>), until the Armijo-Goldstein condition is met ("''backtracking''"). | ||
| + | |||
| + | When the line search is over, we switch to using the Barzilai–Borwein method for finding <math>\gamma</math>: | ||
| + | |||
| + | <math> | ||
| + | \gamma_{n+1} = \frac{|(\textbf{x}_{n+1} - \textbf{x}_n) \cdot (\textbf{F}_{n+1} - \textbf{F}_n)|}{(\textbf{F}_{n+1} - \textbf{F}_n)^2}. | ||
| + | </math> | ||
| + | |||
| + | === Example input file === | ||
| + | |||
| + | Note: [[energy minimization]] is invoked with the <code>[[Documentation#turbogap md|turbogap md]]</code> command. | ||
| − | + | [[optimize]] = "gd" ! Do gradient descent minimization | |
| + | [[e_tol]] = 1.e-6 ! Energy threshold for convergence (per atom) | ||
| + | [[f_tol]] = 0.01 ! Maximum force on any given atom for convergence | ||
| + | [[md_nsteps]] = 1000 ! Maximum number of iterations (it will break earlier if convergence to e_tol is reached) | ||
| + | [[max_opt_step]] = 0.1 ! Angstrom | ||
Latest revision as of 07:17, 31 August 2022
Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For energy minimization, we follow the direction corresponding to minus the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows:
[math]\displaystyle{ \textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n, }[/math]
until we reach the criterion for convergence (set with the e_tol keyword), [math]\displaystyle{ |E_{n+1} - E_n| \lt E_\text{tol} }[/math]. The [math]\displaystyle{ \gamma }[/math] parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. The procedure is detailed below.
Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization
In TurboGAP, the initial [math]\displaystyle{ \gamma }[/math] is chosen by specifying the maximum allowed atomic displacement in the first optimization step, max_opt_step, followed by an automated backtracking line search. During the backtracking line search, we start by (over)estimating the position update according to the user provided max_opt_step (default 0.1 Angstrom), [math]\displaystyle{ \delta x_\text{max} }[/math]:
[math]\displaystyle{ \gamma_0 = \frac{\delta x_\text{max}}{F_\text{max}}, }[/math]
where [math]\displaystyle{ F_\text{max} }[/math] is the maximum force acting on any given atom in the system. Then, we enforce the Armijo-Goldstein condition for accepting this step size:
[math]\displaystyle{ E_{n+1} \le E_n - \gamma_n c \textbf{F} \cdot \textbf{F}, }[/math]
where [math]\displaystyle{ c \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ c = 0.5 }[/math]). If the condition is not met, then we reduce [math]\displaystyle{ \gamma }[/math] at each time step:
[math]\displaystyle{ \gamma_{n+1} = \tau \gamma_n, }[/math]
with [math]\displaystyle{ \tau \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ \tau = 0.5 }[/math]), until the Armijo-Goldstein condition is met ("backtracking").
When the line search is over, we switch to using the Barzilai–Borwein method for finding [math]\displaystyle{ \gamma }[/math]:
[math]\displaystyle{ \gamma_{n+1} = \frac{|(\textbf{x}_{n+1} - \textbf{x}_n) \cdot (\textbf{F}_{n+1} - \textbf{F}_n)|}{(\textbf{F}_{n+1} - \textbf{F}_n)^2}. }[/math]
Example input file
Note: energy minimization is invoked with the turbogap md command.
optimize = "gd" ! Do gradient descent minimization e_tol = 1.e-6 ! Energy threshold for convergence (per atom) f_tol = 0.01 ! Maximum force on any given atom for convergence md_nsteps = 1000 ! Maximum number of iterations (it will break earlier if convergence to e_tol is reached) max_opt_step = 0.1 ! Angstrom