Backpropagation Demystified: How Neural Networks Learn
Learn the fundamentals of neural network training: forward propagation, loss functions, gradient descent, the chain rule, and activation functions like ReLU.
Neural Networks / Backpropagation
What is Backpropagation?
Slide 1 of 8
NEURAL NETWORKS / BACKPROPAGATION
Forward Propagation
The Network Makes a Prediction
NEURAL NETWORKS / BACKPROPAGATION
Measuring Mistakes
The Loss Function
NEURAL NETWORKS / BACKPROPAGATION
Which Way is Down?
Gradient Descent
NEURAL NETWORKS / BACKPROPAGATION
The Chain Rule
Connecting the Dots Backward
The network is a <span style='color: #ffffff; font-weight: 600;'>nested function</span>: loss depends on neuron output, which depends on <span style='color: #4dabf7; font-family: monospace;'>weights</span>.
Chain rule: to find <span style='color: #4dabf7; font-family: monospace;'>∂loss/∂w</span>, multiply <span style='color: #ffffff; font-weight: 600;'>upstream × local</span> gradient.
∂loss/∂w = (∂loss/∂ŷ) × (∂ŷ/∂w)
∂loss/∂b = (∂loss/∂ŷ) × (∂ŷ/∂b)
<span style='color: #ffffff; font-weight: 600;'>Upstream gradient:</span> how loss reacts to the layer's output.
<span style='color: #ffffff; font-weight: 600;'>Local gradient:</span> how the layer's output reacts to its own parameter.
<span style='color: #ffffff; font-weight: 600;'>Key benefit:</span> intermediate values from the forward pass get reused — no redundant computation.
NEURAL NETWORKS / BACKPROPAGATION
Backward Propagation
Computing the Gradient
x = 2.1, w = 1, b = 0 → ŷ = 2.1, y = 4, loss = 3.61
∂loss/∂w = −7.98
∂loss/∂b = −3.8
lr = 0.01
w := 1 − (0.01)(−7.98) = 1.0798
b := 0 − (0.01)(−3.8) = 0.038
New loss = 2.87 ↓ (was 3.61) — saved 0.74 in one step!
Negative gradient = loss drops when we increase the parameter → so we increase it.
NEURAL NETWORKS / BACKPROPAGATION
Why We Need More Than a Line
Activation Functions
NEURAL NETWORKS / BACKPROPAGATION
Why Backpropagation Makes Learning Possible
The Full Picture
- neural-networks
- backpropagation
- machine-learning
- artificial-intelligence
- deep-learning
- gradient-descent
- data-science