S-K Algorithm - Non-Kernel Version
Suppose our training data is `X = {vec{x}_1, ..., vec{x}_k}`. Let `I = {1, ..., k}`. Let `X^+` be the positive examples, and `I^+` be the indices of the positive examples. Let `X^-` be the negative examples, and `I^-` be the indices of the negative examples. Define `X'`, `X^(+)'`, and `X^(-)'` as per the last slide.
- Initialization: Set the vector `vec{w}^{+}` to any point `x in X^(+)'` and `vec{w}^{-}` to any point `vec{x} in X^(-)'`. At each step of our algorithm our separator will be given by `vec{w} = vec{w}^{+} - vec{w}^{-}` and `theta = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2`. To understand what these weights mean consider the two parallel hyperplanes, given by
`vec{n}\cdot (vec{x} - vec{w}^{-}) = 0` and `-vec{n}\cdot (vec{x} - vec{w}^{+}) = 0` where `vec{n} = (vec{w}^{+} - vec{w}^{-})/(||vec{w}^{+} - vec{w}^{-}||`. These two hyperplanes are `||vec{w}|| = ||vec{w}^{+} - vec{w}^{-}||` apart. A point half-way between these hyperplanes is given by `(vec{w}^{+} + vec{w}^{-})/2`. Projecting this along the vector `vec{w} = vec{w}^{+} - vec{w}^{-}` which points between the hyperplanes, gives us `((vec{w}^{+} - vec{w}^{-}) \cdot (vec{w}^{+} + vec{w}^{-}))/2 = (||vec{w}^{+}||^2 - ||vec{w}^{-}||^2)/2 = theta`.
- Stop Condition: Find the vector `vec{x}'_t in X'` closest to either of our current hyperplanes. To do this we choose `t = mbox(argmin)_{i in I} m(vec{x}'_i)` where `m(vec{x}'_i)` is
`vec{n}\cdot(vec{x}'_i -vec{w}^{-})` for `i in I^+` and is `-vec{n}\cdot (vec{x}'_i -vec{w}^{+})` for `i in I^-`. The sign of `m(vec{x}'_t)` indicates which side of hyperplane on. For the hyperplane to classify the corresponding data correctly, we want the sign to be positive, however, we'll be satisified if the sign is only slightly negative. More precisely, if `||vec{w}^{+} - vec{w}^{-}|| - m(vec{x}'_t) < epsilon`, stop output `vec{w}` and `theta` given by the formulas above.
- Adaptation: If `vec{x}'_t in X^+'`, set `vec{w}^{-} := vec{w}^{-}` and set `vec{w}^{+} := (1-q) vec{w}^{+} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{-})\cdot(vec{w}^{+} - vec{w}^{-}))/(||vec{w}^{+} - vec{w}^{-}||^2))`; otherwise, set `vec{w}^{+} := vec{w}^{+}` and set `vec{w}^{-} := (1-q) vec{w}^{-} + q vec{x}'_t` where `q = min(1, ((vec{x}'_t -vec{w}^{+})\cdot(vec{w}^{-} - vec{w}^{+}))/(||vec{w}^{+} - vec{w}^{-}||^2))`.