Skip to content

Gradients

Gradient Issues

Especially for FP32 or lower precision

Gradients __ exponentially during back-propagation Weight gradients Cause Solution
Vanishing (Converging) shrink Too small Deep Networks Weight-initialization
Weights Scaling
Exploding (Diverging) grow Too large Deep Networks Weight-initialization
Clipping
Large loss due to target with large range* Target normalization
  • A target variable with a large spread of values, in turn, may result in large error gradient values causing weight values to change dramatically, making the learning process unstable

Gradient Clipping

rescales gradient to size at most \(\theta\).

\[ g \leftarrow \min \left( 1, \frac{\theta}{\vert g \vert} \right) g \]

If the weights are large, the gradients grow exponentially during back-propagation

Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments