04
Gradient Problems¶
FFNN can cope with these problems because they only have a few hidden layers, but RNN struggles.
Vanishing (Converging) | Exploding (Diverging) | |
---|---|---|
Cause Weights multiplied during BPTT are | Too small | Too large |
Gradients __ exponentially during back-propagation | shrink | grow |
Resultant problem Effect on current output due to past input | Too little | Too high |
Solutions | Scaling | Clipping |
Initial Weights¶
We can avoid this by initializing the weights very carefully
Clipping¶
rescales gradient to size at most \(\theta\).
\[ g \leftarrow \min \left( 1, \frac{\theta}{\vert g \vert} \right) g \]
If the weights are large, the gradients grow exponentially during back-propagation