CS256
Chris Pollett
Nov 29, 2017
Hyperparameter | Increases Capacity when... | Reason | Caveat |
---|---|---|---|
Number of Hidden Units | increased | Increasing the number of hidden units increases the representational capacity of the model. | Increasing the number of hidden units increased both the time and memory cost of every operation int he model. |
Learning Rate | tuned optimally | An improper learning rate, whether too high or too low, results in a model with low effective capacity due to optimization failure. | |
Convolutional Kernel Width | increased | Increasing the kernel width increases the number of parameters in the model. | A wide kernel results in a narrower output dimension, reducing the model capacity unless you use implicit zero padding to reduce this effect. Wider kernels require more memory and runtime, usually, but narrower outputs can sometimes reduce memory cost. |
Implicit Zero Padding | increased | Adding implicit zeros before convolution keeps the representation size large. | Increases time and memory costs of more operations. |
Weight Decay Coefficient | decreased | Decreasing the weight decay coefficient (for `L_2` or `L_1`, etc regularizers) frees the model parameters to become larger. | |
Dropout Rate | decreased | Dropping units less often gives the units more opportunities to "conspire" with each other to fit the training set. |