NN Design Methodology




CS256

Chris Pollett

Nov 29, 2017

Outline

Introduction

Determining Whether To Gather More Data -- Training Data

Determining Whether To Gather More Data -- Test Data

How Much New Data to Gather?

Selecting Hyperparameters

Manual Hyperparameter Tuning

Common Hyperparameter Tuning Rules of Thumb

HyperparameterIncreases Capacity when...ReasonCaveat
Number of Hidden UnitsincreasedIncreasing the number of hidden units increases the representational capacity of the model.Increasing the number of hidden units increased both the time and memory cost of every operation int he model.
Learning Ratetuned optimallyAn improper learning rate, whether too high or too low, results in a model with low effective capacity due to optimization failure.
Convolutional Kernel WidthincreasedIncreasing the kernel width increases the number of parameters in the model.A wide kernel results in a narrower output dimension, reducing the model capacity unless you use implicit zero padding to reduce this effect. Wider kernels require more memory and runtime, usually, but narrower outputs can sometimes reduce memory cost.
Implicit Zero PaddingincreasedAdding implicit zeros before convolution keeps the representation size large.Increases time and memory costs of more operations.
Weight Decay CoefficientdecreasedDecreasing the weight decay coefficient (for `L_2` or `L_1`, etc regularizers) frees the model parameters to become larger.
Dropout RatedecreasedDropping units less often gives the units more opportunities to "conspire" with each other to fit the training set.

Automatic Hyperparameter Algorithms

Grid Search Hyperparameter Selection

In-Class Exercise

Random Search Hyperparameter Selection