Introduction

Gradient descent is a popular optimization algorithm used to find the minimum of a function. It iteratively updates the parameters in the direction of the negative gradient of the function. However, choosing the right step size is crucial for the algorithm's convergence and efficiency. One way to determine the step size is by using a line search method, such as the Wolfe conditions.

Mathematical Background

Consider a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ that we want to minimize. The gradient descent algorithm updates the parameters $\mathbf{x} \in \mathbb{R}^n$ iteratively using the following rule:

$$ ⁍ $$

where $\alpha_k > 0$ is the step size at iteration $k$, and $\nabla f(\mathbf{x}_k)$ is the gradient of $f$ at $\mathbf{x}_k$.

The Wolfe conditions help determine a suitable step size $\alpha_k$ at each iteration. The conditions are as follows:

  1. Sufficient decrease condition: (The Armijo rule)

$$

⁍ $$

  1. Curvature condition:

$$

⁍ $$

Here, $c_1$ and $c_2$ are constants satisfying $0 < c_1 < c_2 < 1$.

Algorithm

The gradient descent algorithm with line search using the Armijo conditions can be summarized as follows:

  1. Choose an initial point $\mathbf{x}_0$ and set $k = 0$.
  2. Compute the gradient $\nabla f(\mathbf{x}_k)$.
  3. Choose an initial step size $\alpha_k > 0$.
  4. While the Wolfe / Armijo conditions are not satisfied:
  5. Update the parameters: $\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha_k \nabla f(\mathbf{x}_k)$.
  6. Set $k = k + 1$ and go to step 2 until convergence.