Long Short-Term Memory¶
RNN only has Short-Term memory, which does not work for well long sentences, and hence for use-cases such as Grammar Checking, we prefer LSTM
Solves vanishing gradients issue of RNN
Intuition: LSTM is to RNN what ResNet is to PlainNet
Gates¶
Gate | Function | |
---|---|---|
Forget | Control forgetting/retaining info currently in memory | Initialize with a slight +ve bias to allow gradient flow at the beginning of training, and then slowly start forgetting |
Input | Control whether to add new info to memory | |
Output | Control effect of hidden state on output |
\[ \begin{aligned} I_t &= \sigma(X_t W_{xi} + H_{tβ1} W_{hi} + b_i) \\ F_t &= \sigma(X_t W_{xf} + H_{tβ1} W_{hf} + b_f ) \\ O_t &= \sigma(X_t W_{xo} + H_{tβ1} W_{ho} + b_o) \\ \tilde C_t &= \tanh(X_t W_{xc} + H_{tβ1} W_{hc} + b_c) \\ \\ C_t &= F_t \odot C_{tβ1} + I_t \odot \tilde C_t \\ h_t &= O_t \odot \tanh(C_t) \end{aligned} \]
- \(\tilde C_t\) is the candidate memory
- \(C_t\)Β is the long-term memory