Finish Optimization




CS256

Chris Pollett

Nov 3, 2021

Outline

Introduction

Optimization and SGD

Excess Error

Momentum

Nestorov Momentum

Parameter Initialization

In-Class Exercise

Initializing Biases

Adaptive Learning Rate Algorithms

Adaptive Learning Rate Algorithms that work with SGD