Which of the following is true?
- LSTMs have forget components/gates.
- Clipping the gradient by norm is the easiest to compute form of gradient clipping under all performance measures.
- Increasing the learning rate of your neural net always increases its effective capacity.