# Calculation of Back Propagation

The purpose of this deliverable is to understand the mathematical basis of neural networks. This is important since a neural network model will be used to build the Question Answering System. I fulfilled the purpose by doing back propagation on a dummy feed forward neural network example. In the example, `x` is a word feature vector. `y` is an one-hot vector. `W_1`, `b_1`, `W_2`, and `b_2` are parameters.

The architecture of the neural network is

`hat{y}=softmax(z_2)`

`z_2=h\cdot W_2 + b_2`

`h=sigmoid(z_1)`

`z_1=x\cdot W_1+b_1`

The loss function is

`J(W_1, b_1, W_2, b_2, x, y)=\mbox{cross_entropy}(y, \hat{y})=-\frac{1}{D_y}\sum_{i=1}^{D_y}y_i \times \log{\hat{y_i}} `

After using chain rules multiple times, I got

`\frac{dJ}{dz_2}=\hat{y} - y`

`\frac{dJ}{db_2}=\frac{dJ}{dz_2}`

`\frac{dJ}{dh}=\frac{dJ}{dz_2}\cdot W_2^T`

`\frac{dJ}{dW_2}=h^T \cdot \frac{dJ}{dz_2}`

By working out this dummy example, I got a solid foundation of back propagation and became more prepared for understanding more complex neural network models.