Which of the following is true?
- When training on convex functions, the excess error rate of gradient descent and SGD fall at the same rate as a function of the number iterations.
- Weight parameters should always be initialized to 0 to ensure proper training of the net.
- The initial description of CNN layer we gave assumed a stride of 1.