Recurrent Neural Networks




CS256

Chris Pollett

Nov 20, 2017

Outline

Introduction

Unfolding Computation Graphs

Example of unfolding a computation graph

Advantages of RNNs

  1. Regardless of the sequence length, the model learned always has the same input size. This is because it is specified in terms of the transition from one state to another, rather than in terms of a variable-length history of states.
  2. The same transition function `f` is used with the same parameters at every step. Hence, we are learning a single model `f` that operates on all time steps, and all sequence lengths, rather than needing to learn a separate model `g^{(t)}` for each number of time steps.

Quiz

Which of the following is true?

  1. When training on convex functions, the excess error rate of gradient descent and SGD fall at the same rate as a function of the number iterations.
  2. Weight parameters should always be initialized to 0 to ensure proper training of the net.
  3. The initial description of CNN layer we gave assumed a stride of 1.

Designing Recurrent Neural Nets

Converting Graphs to Forward Propagation Equations

Backpropagation Through Time

Teacher Forcing, Networks with Output Recurrence

Output Hidden Layer RNN image  Train time versus Test Time when teaching forcing used image

Gradient Descent for RNNs

Bidirectional RNNs

Bidirectional RNN image 

Encoder-Decoder Sequence-to-Sequence Architectures

Encoder Decoder RNN image