Probability, PAC Learning, Linear Algebra




CS256

Chris Pollett

Aug 30, 2017

Outline

Introduction

Generality versus Learning

How good is a learning algorithm?

Probability (Take 1) - Distributions

Conditional Probability and Independence

Discrete Random Variables

Expectation and Variance

In-Class Exercise

Markov Inequality

Error Reduction -- Chernoff Bounds

Proof of Chernoff Bounds

If t is a positive real number, then
`Pr[X ge (1+c)pn]= Pr[e^(tX) ge e^(t(1+c)pn)]` (*)
By Markov's Inequality,
`Pr[e^(tX) > k cdot E(e^(tX))] le 1/k` for any real `k > 0`.
Taking `k=e^(t(1+c)pn)/[E(e^(tX))]` and using (*) gives
`Pr[X ge (1+c)pn] le [E(e^(tX))]cdot e^(-t(1+c)pn)`. (**)
Since `X= sum_(i=1)^n X_i`, we have `E(e^(tX) )=[E(e^(tX_1))]^n` which in turn equals `(1 + p(e^t-1))^n`. Substituting this into (**) gives:
`Pr[X ge (1+c)pn] le (1 + p(e^t-1))^n cdot e^(-t(1+c)pn)`
`le e^(-t(1+c)pn) cdot e^(pn (e^t-1))`, since `(1+a)^n le e^(an)`. Take `t=ln(1+c)` to get `Pr[X ge (1+c)pn] le e^(pn(c-(1+c)ln(1+c)))`. Taylor expanding `ln(1+c)` as `c - c^2/2 + ...` and substituting gives the result. I.e.,
`e^(pn(c-(1+c)ln(1+c))) le e^(pn(c-(1+c)(c- c^2/2 +c^3/3 + ...))) le e^(-(c^2 pn)/2).`

Corollary. If `p=1/2 + epsilon` for some `epsilon > 0`, then the probability that `sum_(i=1)^n X_ i le n/2` is at most `e^(-epsilon^2n/4)`.

Proof. Take `c = epsilon/(1/2+ epsilon)`. Q.E.D.

PAC Learning

Is anything PAC-learnable?

Linear Algebra (Take 1)

Matrix Operations

More Matrix Operations

Norms