Finish SVMs, Numpy




CS256

Chris Pollett

Sep 27, 2021

Outline

Introduction

Some Facts About the Scaled-Convex Hull

S-K Algorithm - Non-Kernel Version

Suppose our training data is `X = {vec{x}_1, ..., vec{x}_k}`. Let `I = {1, ..., k}`. Let `X^+` be the positive examples, and `I^+` be the indices of the positive examples. Let `X^-` be the negative examples, and `I^-` be the indices of the negative examples. Define `X'`, `X^(+)'`, and `X^(-)'` as per the last slide.

  1. Initialization: Set the vector `vec{w}^{+}` to any point `x in X^(+)'` and `vec{w}^{-}` to any point `vec{x} in X^(-)'`. At each step of our algorithm our separator will be given by `vec{w} = vec{w}^{+} - vec{w}^{-}` and `theta = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2`. To understand what these weights mean consider the two parallel hyperplanes, given by `vec{n}\cdot (vec{x} - vec{w}^{-}) = 0` and `-vec{n}\cdot (vec{x} - vec{w}^{+}) = 0` where `vec{n} = (vec{w}^{+} - vec{w}^{-})/(||vec{w}^{+} - vec{w}^{-}||`. These two hyperplanes are `||vec{w}|| = ||vec{w}^{+} - vec{w}^{-}||` apart. A point half-way between these hyperplanes is given by `(vec{w}^{+} + vec{w}^{-})/2`. Projecting this along the vector `vec{w} = vec{w}^{+} - vec{w}^{-}` which points between the hyperplanes, gives us `((vec{w}^{+} - vec{w}^{-}) \cdot (vec{w}^{+} + vec{w}^{-}))/2 = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2 = theta`.
  2. Stop Condition: Find the vector `vec{x}'_t in X'` closest to either of our current hyperplanes. To do this we choose `t = mbox(argmin)_{i in I} m(vec{x}'_i)` where `m(vec{x}'_i)` is `vec{n}\cdot(vec{x}'_i -vec{w}^{-})` for `i in I^+` and is `-vec{n}\cdot (vec{x}'_i -vec{w}^{+})` for `i in I^-`. The sign of `m(vec{x}'_t)` indicates which side of hyperplane on. For the hyperplane to classify the corresponding data correctly, we want the sign to be positive, however, we'll be satisified if the sign is only slightly negative. More precisely, if `||vec{w}^{+} - vec{w}^{-}|| - m(vec{x}'_t) < epsilon`, stop output `vec{w}` and `theta` given by the formulas above.
  3. Adaptation: If `vec{x}'_t in X^+'`, set `vec{w}^{-} := vec{w}^{-}` and set `vec{w}^{+} := (1-q) vec{w}^{+} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{-})\cdot(vec{w}^{+} - vec{w}^{-}))/(||vec{w}^{+} - vec{w}^{-}||^2))`; otherwise, set `vec{w}^{+} := vec{w}^{+}` and set `vec{w}^{-} := (1-q) vec{w}^{-} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{+})\cdot(vec{w}^{-} - vec{w}^{+}))/(||vec{w}^{+} - vec{w}^{-}||^2))`.

S-K Algorithm - Intuitions

Image showing relevant quantities updated during a step of the S-K Algorithm

S-K Algorithm - Kernel Version - Preliminaries

S-K Algorithm - Kernel Version

Algorithm:

  1. Initialization: Set `alpha_(i_1) = 1` for `i_1 in I^+`, `alpha_(j_1) =1` for `j_1 in I^-`. Set all the remaining `alpha_i = 0`. Set `A = K(vec{x}'_{i_1}, vec{x}'_{i_1})`, `B = K(vec{x}'_{j_1}, vec{x}'_{j_1})`, `C= K(vec{x}'_{i_1}, vec{x}'_{j_1})`. For `i in I` define `D_i = K(vec{x}'_{i}, vec{x}'_{i_1})` and `E_i = K(vec{x}'_{i}, vec{x}'_{j_1})`.
  2. Stop Condition: Find the vector `vec{x}'_t in X'` closest to our current separating hyper-surfaces. To do this we choose `t = mbox(argmin)_{i in I} m_i` where `m_i` is `(D_i - E_i + B - C)/sqrt(A + B - 2C)` for `i in I^+` and is `(E_i - D_i + A - C)/sqrt(A + B - 2C)` for `i in I^-`. If `sqrt(A + B - 2C) - m_t < epsilon`, stop and define our output function as `f` with the current settings for `A`, `B`, and the `alpha_i`'s.
  3. Adaptation:
    1. If `t in I^+'`, then set `alpha_i := (1-q)alpha_i + q delta_{i,t}` for `i in I^+`, where `q = min(1, (A - D_t + E_t - C)/(A + K(vec{x}'_t, vec{x}'_t) - 2(D_t - E_t)))`.
      Set `A := A (1-q)^2 + 2(1 - q)q D_t + q^2K(vec{x}'_t, vec{x}'_t)`, `C := (1 - q) C + q E_t`, and for `i in I` set `D_i = (1-q)D_i + q K(vec{x}'_i, vec{x}'_t)`.
    2. If `t in I^-'`, then set `alpha_i := (1-q)alpha_i + q delta_{i,t}` for `i in I^-`, where `q = min(1, (B - E_t + D_t - C)/(B + K(vec{x}'_t, vec{x}'_t) - 2(E_t - D_t)))`.
      Set `B := B (1-q)^2 + 2(1 - q)q E_t + q^2K(vec{x}'_t, vec{x}'_t)`, `C := (1 - q) C + q D_t`, and for `i in I` set `E_i = (1-q)E_i + q K(vec{x}'_i, vec{x}'_t)`.

Quiz

Which of the following is true?

  1. We showed `MAJ_n` cannot be computed by a two layer perceptron network.
  2. Our result about three layer threshold circuits being able to compute arbitrary boolean functions relied on padding some gates with 0 and 1 as inputs.
  3. The shape of a `mu`-reduced convex hull of a set of points is always geometrically similar (formal def) to that of the convex hull of those points.

Numpy

Importing and using Numpy

Array Creation and Assignment

Shape and Content of Numpy Arrays

Element Access

Operations on Arrays

Linear Algebra and Polynomials