Gradient descent

From TurboGAP
Jump to navigation Jump to search

Gradient descent is a method for finding extrema of a function by following the direction of fastest change in said function, i.e., its gradient. For energy minimization, we follow the direction corresponding to minus the gradient, that is, we update the atomic positions along the direction given by the forces. Iteratively, we propagate the atomic positions as follows:

[math]\displaystyle{ \textbf{x}_{n+1} = \textbf{x}_n + \gamma_n \textbf{F}_n, }[/math]

until we reach the criterion for convergence (set with the e_tol keyword), [math]\displaystyle{ |E_{n+1} - E_n| \lt E_\text{tol} }[/math]. The [math]\displaystyle{ \gamma }[/math] parameter determines by how much we update the atomic positions along the search direction, and is allowed to change between iterations. The procedure is detailed below.

Gradient descent with Armijo-Goldstein backtracking line search and Barzilai-Borwein step optimization

In TurboGAP, the initial [math]\displaystyle{ \gamma }[/math] is chosen by specifying the maximum allowed atomic displacement in the first optimization step, max_opt_step, followed by an automated backtracking line search. During the backtracking line search, we start by (over)estimating the position update according to the user provided max_opt_step (default 0.1 Angstrom), [math]\displaystyle{ \delta x_\text{max} }[/math]:

[math]\displaystyle{ \gamma_0 = \frac{\delta x_\text{max}}{F_\text{max}}, }[/math]

where [math]\displaystyle{ F_\text{max} }[/math] is the maximum force acting on any given atom in the system. Then, we enforce the Armijo-Goldstein condition for accepting this step size:

[math]\displaystyle{ E_{n+1} \le E_n - \gamma_n c \textbf{F} \cdot \textbf{F}, }[/math]

where [math]\displaystyle{ c \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ c = 0.5 }[/math]). If the condition is not met, then we reduce [math]\displaystyle{ \gamma }[/math] at each time step:

[math]\displaystyle{ \gamma_{n+1} = \tau \gamma_n, }[/math]

with [math]\displaystyle{ \tau \in (0,1) }[/math] (hardcoded in TurboGAP to [math]\displaystyle{ \tau = 0.5 }[/math]), until the Armijo-Goldstein condition is met ("backtracking").

When the line search is over, we switch to using the Barzilai–Borwein method for finding [math]\displaystyle{ \gamma }[/math]:

[math]\displaystyle{ \gamma_{n+1} = \frac{|(\textbf{x}_{n+1} - \textbf{x}_n) \cdot (\textbf{F}_{n+1} - \textbf{F}_n)|}{(\textbf{F}_{n+1} - \textbf{F}_n)^2}. }[/math]

Example input file

Note: energy minimization is invoked with the turbogap md command.

optimize = "gd"    ! Do gradient descent minimization
e_tol = 1.e-6      ! Energy threshold for convergence (per atom)
f_tol = 0.01       ! Maximum force on any given atom for convergence
md_nsteps = 1000   ! Maximum number of iterations (it will break earlier if convergence to e_tol is reached)
max_opt_step = 0.1 ! Angstrom