Finish Cross-Entropy, Softmax Layers, Minimization Methods




CS256

Chris Pollett

Oct 18, 2021

Outline

Introduction

Cross-Entropy Cost Function - Logistic Function Case

Output Units

Motivating the Sigmoid Function, Softmax

Training a Softmax Layer

Minimizing Functions

Newton Methods For Finding Minima

The BFGS Algorithm

Quiz

Which of the following is true?

  1. All mathematical estimators have some bias.
  2. To obtain an MLE for labeled data, we used conditional probabilities.
  3. The cross entropy function developed for the linear Gaussian case is strictly larger than mean square error.

Gradient Descent

Hidden Units

Types of Activation

Remarks on Initialization