Gradient descent is a popular optimization algorithm used to find the minimum of a function. It iteratively updates the parameters in the direction of the negative gradient of the function. However, choosing the right step size is crucial for the algorithm's convergence and efficiency. One way to determine the step size is by using a line search method, such as the Wolfe conditions.
Consider a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ that we want to minimize. The gradient descent algorithm updates the parameters $\mathbf{x} \in \mathbb{R}^n$ iteratively using the following rule:
$$ ⁍ $$
where $\alpha_k > 0$ is the step size at iteration $k$, and $\nabla f(\mathbf{x}_k)$ is the gradient of $f$ at $\mathbf{x}_k$.
The Wolfe conditions help determine a suitable step size $\alpha_k$ at each iteration. The conditions are as follows:
$$
⁍ $$
$$
⁍ $$
Here, $c_1$ and $c_2$ are constants satisfying $0 < c_1 < c_2 < 1$.
The gradient descent algorithm with line search using the Armijo conditions can be summarized as follows: