Skip to content

Advanced RNN

Non-Linear Units

Replace \(\phi\) of updates with MLP

✅ Keeps structure of latent space

❌ Costly (due to more complex gradients)

Deep RNN

Rather than using just 1 hidden layer, we use more hidden layers, ie, each time stamp of RNN has multiple cells

image-20230527183513296

\[ \begin{aligned} H_t^j &= f_j(H_{t-1}^j, H_t^{j−1}) \\ O_t &= g(H_t^L) \end{aligned} \]

Bidirectional RNN

Context

  • I am happy
  • I am very hungry
  • I am so hungry, I could eat 2 plates of rice

Very different words to fill in, depending on past and future context of a word

Traditional RNNs only look at the past. In interpolation (fill in) we also use the future.

Implementation

  • One RNN forward
  • One RNN backward
  • Combine both hidden states for output generation

image-20230527184227397

idk

Bi-RNN does not work for sequence generation

Training Testing
image-20230527184327354 image-20230527184332458

However, we can still use it to encode the sequence

Residual RNN

Input of every second layer is also added to its output (residual connection)

image-20230527184555086

\[ \bar H_t^{(2i)} = H_t^{(2i)} + H_t^{(2i)−1} \]

Adding Layers

Adding a layer to a model changes function 
class.

image-20230527184430838

We want to add to the function class, using Taylor expansion
 style parametrization

\[ f(x) = x + g(x) \]

image-20230527184524255

Variants

  • Simple addition
  • Nonlinearity before addition
  • Could also concatenate

DenseNet RNN

Concatenate outputs of previous layers as input to next layer, with occasional transition layers to reduce dimensionality

image-20230527195815669

\[ \bar H_t^{(t)} = [H_t^{(t)}, \bar H_t^{t−1}] \]
Last Updated: 2024-12-26 ; Contributors: AhmedThahir, web-flow

Comments