Chris Pollett > Students > Li

    Print View

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Del 1: Back Propagation]

    [Del 2: Word Embedding]

    [Del 3: Setup]

    [Del 4: QA System Architecture]

    [CS297Report [PDF]]

    [CS 298 Proposal]

    [Del 6: QA System]

    [CS298Report [PDF]]

    [Oral Defense Slides[PDF]]

























Calculation of Back Propagation

The purpose of this deliverable is to understand the mathematical basis of neural networks. This is important since a neural network model will be used to build the Question Answering System. I fulfilled the purpose by doing back propagation on a dummy feed forward neural network example. In the example, `x` is a word feature vector. `y` is an one-hot vector. `W_1`, `b_1`, `W_2`, and `b_2` are parameters.

The architecture of the neural network is
`hat{y}=softmax(z_2)`
`z_2=h\cdot W_2 + b_2`
`h=sigmoid(z_1)`
`z_1=x\cdot W_1+b_1`

The loss function is
`J(W_1, b_1, W_2, b_2, x, y)=\mbox{cross_entropy}(y, \hat{y})=-\frac{1}{D_y}\sum_{i=1}^{D_y}y_i \times \log{\hat{y_i}} `

After using chain rules multiple times, I got
`\frac{dJ}{dz_2}=\hat{y} - y`
`\frac{dJ}{db_2}=\frac{dJ}{dz_2}`
`\frac{dJ}{dh}=\frac{dJ}{dz_2}\cdot W_2^T`
`\frac{dJ}{dW_2}=h^T \cdot \frac{dJ}{dz_2}`

Concrete hand calculation process [PDF]

By working out this dummy example, I got a solid foundation of back propagation and became more prepared for understanding more complex neural network models.