Long Short-Term MemoryΒΆ
RNN only has Short-Term memory, which does not work for well long sentences, and hence for use-cases such as Grammar Checking, we prefer LSTM
Solves vanishing gradients issue of RNN
Intuition: LSTM is to RNN what ResNet is to PlainNet
GatesΒΆ
Gate | Function | |
---|---|---|
Forget | Control forgetting/retaining info currently in memory | Initialize with a slight +ve bias to allow gradient flow at the beginning of training, and then slowly start forgetting |
Input | Control whether to add new info to memory | |
Output | Control effect of hidden state on output |
- is the candidate memory
- is the long-term memory