1. Introduction to Backpropagation

Backpropagation is a fundamental algorithm used in training artificial neural networks. It optimizes the weights of the network by minimizing the error between the predicted output and the actual output. The algorithm works by propagating the error backward through the network, adjusting the weights based on the gradient of the error.

2. The Chain Rule

The chain rule is a key mathematical concept used in backpropagation. It allows us to compute the derivative of a composite function. For functions $f$ and $g$, the chain rule states: $\frac{d}{dx} f(g(x)) = f'(g(x)) \cdot g'(x)$

3. Derivation of Backpropagation

Let's consider a simple neural network with one hidden layer.

The simple neural-network under consideration

We denote:

$x$ as the input
$w_1, w_2$ as the weights
$b_1, b_2$ as the biases
$h$ as the hidden layer activation
$y$ as the output
$L$ as the loss function