Skip to content

04

Gradient Problems

FFNN can cope with these problems because they only have a few hidden layers, but RNN struggles.

Vanishing (Converging) Exploding (Diverging)
Cause
Weights multiplied during BPTT are
Too small Too large
Gradients __ exponentially during back-propagation shrink grow
Resultant problem
Effect on current output due to past input
Too little Too high
Solutions Scaling Clipping

Initial Weights

We can avoid this by initializing the weights very carefully

Clipping

rescales gradient to size at most \(\theta\).

\[ g \leftarrow \min \left( 1, \frac{\theta}{\vert g \vert} \right) g \]

If the weights are large, the gradients grow exponentially during back-propagation

Last Updated: 2024-05-14 ; Contributors: AhmedThahir

Comments