SVM Training




CS256

Chris Pollett

Sep 27, 2017

Outline

Introduction

Maximal Margin Separator

Iterative Algorithms for Maximal Separators

Convex Hulls and Their Variants

Examples of Convex and Non-Convex Sets

Remarks on Convex Hulls and Their Variants

Reduced and Scaled Convex Hulls

In-Class Exercise

Some Facts About the Scaled-Convex Hull

S-K Algorithm - Non-Kernel Version

Suppose our training data is `X = {vec{x}_1, ..., vec{x}_k}`. Let `I = {1, ..., k}`. Let `X^+` be the positive examples, and `I^+` be the indices of the positive examples. Let `X^-` be the negative examples, and `I^-` be the indices of the negative examples. Define `X'`, `X^(+)'`, and `X^(-)'` as per the last slide.

  1. Initialization: Set the vector `vec{w}^{+}` to any point `x in X^(+)'` and `vec{w}^{-}` to any point `vec{x} in X^(-)'`. At each step of our algorithm our separator will be given by `vec{w} = vec{w}^{+} - vec{w}^{-}` and `theta = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2`. To understand what these weights mean consider the two parallel hyperplanes, given by `vec{n}\cdot (vec{x} - vec{w}^{-}) = 0` and `-vec{n}\cdot (vec{x} - vec{w}^{+}) = 0` where `vec{n} = (vec{w}^{+} - vec{w}^{-})/(||vec{w}^{+} - vec{w}^{-}||`. These two hyperplanes are `||vec{w}|| = ||vec{w}^{+} - vec{w}^{-}||` apart. A point half-way between these hyperplanes is given by `(vec{w}^{+} + vec{w}^{-})/2`. Projecting this along the vector `vec{w} = vec{w}^{+} - vec{w}^{-}` which points between the hyperplanes, gives us `((vec{w}^{+} - vec{w}^{-}) \cdot (vec{w}^{+} + vec{w}^{-}))/2 = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2 = theta`.
  2. Stop Condition: Find the vector `vec{x}'_t in X'` closest to either of our current hyperplanes. To do this we choose `t = mbox(argmin)_{i in I} m(vec{x}'_i)` where `m(vec{x}'_i)` is `vec{n}\cdot(vec{x}'_i -vec{w}^{-})` for `i in I^+` and is `-vec{n}\cdot (vec{x}'_i -vec{w}^{+})` for `i in I^-`. The sign of `m(vec{x}'_t)` indicates which side of hyperplane on. For the hyperplane to classify the corresponding data correctly, we want the sign to be positive, however, we'll be satisified if the sign is only slightly negative. More precisely, if `||vec{w}^{+} - vec{w}^{-}|| - m(vec{x}'_t) < epsilon`, stop output `vec{w}` and `theta` given by the formulas above.
  3. Adaptation: If `vec{x}'_t in X^+'`, set `vec{w}^{-} := vec{w}^{-}` and set `vec{w}^{+} := (1-q) vec{w}^{+} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{-})\cdot(vec{w}^{+} - vec{w}^{-}))/(||vec{w}^{+} - vec{w}^{-}||^2))`; otherwise, set `vec{w}^{+} := vec{w}^{+}` and set `vec{w}^{-} := (1-q) vec{w}^{-} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{+})\cdot(vec{w}^{-} - vec{w}^{+}))/(||vec{w}^{+} - vec{w}^{-}||^2))`.

S-K Algorithm - Intuitions

Image showing relevant quantities updated during a step of the S-K Algorithm

S-K Algorithm - Kernel Version - Preliminaries

S-K Algorithm - Kernel Version

Algorithm:

  1. Initialization: Set `alpha_(i_1) = 1` for `i_1 in I^+`, `alpha_(j_1) =1` for `j_1 in I^-`. Set all the remaining `alpha_i = 0`. Set `A = K(vec{x}'_{i_1}, vec{x}'_{i_1})`, `B = K(vec{x}'_{j_1}, vec{x}'_{j_1})`, `C= K(vec{x}'_{i_1}, vec{x}'_{j_1})`. For `i in I` define `D_i = K(vec{x}'_{i}, vec{x}'_{i_1})` and `E_i = K(vec{x}'_{i}, vec{x}'_{j_1})`.
  2. Stop Condition: Find the vector `vec{x}'_t in X'` closest to our current separating hyper-surfaces. To do this we choose `t = mbox(argmin)_{i in I} m_i` where `m_i` is `(D_i - E_i + B - C)/sqrt(A + B - 2C)` for `i in I^+` and is `(E_i - D_i + A - C)/sqrt(A + B - 2C)` for `i in I^-`. If `sqrt(A + B - 2C) - m_t < epsilon`, stop and define our output function as `f` with the current settings for `A`, `B`, and the `alpha_i`'s.
  3. Adaptation:
    1. If `t in I^+'`, then set `alpha_i := (1-q)alpha_i + q delta_{i,t}` for `i in I^+`, where `q = min(1, (A - D_t + E_t - C)/(A + K(vec{x}'_t, vec{x}'_t) - 2(D_t - E_t)))`.
      Set `A := A (1-q)^2 + 2(1 - q)q D_t + q^2K(vec{x}'_t, vec{x}'_t)`, `C := (1 - q) C + q E_t`, and for `i in I` set `D_i = (1-q)D_i + q K(vec{x}'_i, vec{x}'_t)`.
    2. If `t in I^-'`, then set `alpha_i := (1-q)alpha_i + q delta_{i,t}` for `i in I^-`, where `q = min(1, (B - E_t + D_t - C)/(B + K(vec{x}'_t, vec{x}'_t) - 2(E_t - D_t)))`.
      Set `B := B (1-q)^2 + 2(1 - q)q E_t + q^2K(vec{x}'_t, vec{x}'_t)`, `C := (1 - q) C + q D_t`, and for `i in I` set `E_i = (1-q)E_i + q K(vec{x}'_i, vec{x}'_t)`.