CS256
Chris Pollett
Sep 13, 2021
Theorem. If `C` is a gradual class of boolean threshold functions, then the perceptron rule is a PAC Learning algorithm for `C` under the uniform distribution on `{0, 1}^n`.
Proof. Let `vec{w} \cdot vec{x} \ge theta` be an `n`-bit linear threshold function from the class `C`. Without loss of generality, we can take `w,theta` to be normalized, so that `|vec{w}\cdot vec{x} -theta|` is the distance of the point `x` to the hyperplane. By the definition of gradual, there is some constant `k` such that for all `tau>0`, the odds that a uniformly chosen element of `{0,1}^n` is within distance `tau` of the hyperplane `vec{w}\cdot vec{x} = theta` is at most `tau/(2k)`. If we set `tau = k epsilon`, then with probability at most `epsilon/2`, a random example drawn from `{0,1}^n` is within `k epsilon` of the hyperplane. From this, if we let `B \subseteq {0,1}^n` be the examples `x` which lie within `k epsilon` of the hyperplane, then `Pr[x in B] le epsilon/2`.
Let `(w_t, theta_t)` be the perceptron algorithm's hypothesis after `t` updates have been made. If `epsilon ge 1`, the definition of PAC-learnability is trivially satisfied, so assume `epsilon < 1`. Also, from the definition of gradual, if a collection of hyperplanes is gradual with constant `c` then it will be gradual for constant `c' > c`, So we can assume `k` above is at least 1. Suppose `(w_t, theta_t)` is not yet `epsilon`-accurate, then with probability at most `epsilon/2 < 1/2`, the next example which causes an update will be in `B`. Define the potential function
`N_t(alpha) = ||alpha w - w_t||^2 + (alpha theta - theta_t)^2`.
The perceptron update rule tells us `vec{w}_{t+1} = \vec{w}_t \pm x` and `\theta_{t+1} = \theta_{t} bar{+} 1`, so `N_{t+1}(alpha) - N_t(alpha)` is
\begin{eqnarray*}
\Delta N(\alpha) &=& ||\alpha \vec{w} - \vec{w_{t+1}}||^2 + (\alpha \theta - \theta_{t+1})^2\\
&&\quad - ||\alpha \vec{w} - \vec{w_t}||^2 - (\alpha \theta - \theta_t)^2\\
&=&\mp 2\alpha \vec{w} \cdot \vec{x} \pm 2\alpha\theta \pm 2\vec{w_t}\cdot \vec{x} \mp 2 \theta_t + ||x||^2 +1\\
&\leq& 2 \alpha A \pm 2 (\vec{w}_t \cdot \vec{x} - \theta_t) + n+1.
\end{eqnarray*}
with `A = bar{+} (vec{w}\cdot vec{x} - theta)`. We are again using above that `||x||^2 le n`.
Since we are assuming `vec{x}` was misclassified, we know `\pm(vec{w_t} \cdot vec{x} - theta_t) < 0`, so `\Delta N(\alpha) < 2alpha A + n +1`.
If `x in B` then `A \le 0`; if `x !in B`, then `A leq -k epsilon`. So `\Delta N(\alpha) < n + 1` for `x in B` and `\Delta N(\alpha) < n + 1 - 2k \epsilon\alpha`
for `x !in B`. Suppose the perceptron algorithm has made `r` updates with examples in `B`, and `s` updates for examples outside `B`. Since `(vec{w}, theta)` was normalized,
`|theta| \leq sqrt(n)`. Recall at the start of the perceptron algorithm the initial weights are all `0`. Hence, `N_0(\alpha) \leq alpha^2(n+1)`. Since for all `t`,
`N_t(\alpha) ge 0`, it follows that
`0 le r(n+1) + s(n+1 - 2k\epsilon\alpha) + alpha^2(n+1)`.
Setting `alpha = (12(n+1))/(5 k epsilon)`, the above simplifies to
`0 \leq r - 19/5s + (144(n+1)^2)/(25(k epsilon)^2)`.
If `m_1 = r + s = (144(n+1)^2)/(25(k epsilon)^2)` updates have been made, then `r = m_1 - s`, and if we substitute this into the above inequality, we get
\begin{eqnarray*}
0 &\leq& m_1 - s - \frac{19}{5}s + m_1\\
0 &\leq& 2m_1 - \frac{24}{5} s\\
\frac{24}{5}s &\leq & 2m_1\\
s &\leq& \frac{10}{24} m_1\\
s &\leq & \frac{5}{12}m_1.
\end{eqnarray*}
So at least `7/12` fraction of the updates must have been made on examples in `B`.
If the perceptron's hypothesis has never been `\epsilon`-accurate, then from our discussion at the start of the proof, at each update, the probability of that update occurring on a point in `B` is at most `1/2`. So by Chernoff Bounds, the probability that more than `7/12`ths of `m = max(-144ln(\delta/2), m_1)` updates occur in `B` is at most `delta/2`. I. e., Chernoff Bound's says the odds we see `(7/12)m` trials in `B` when we should expect only `(1/2)m` trials in `B` is governed by `p=1/2`, `c` and `m` where `(1+c)(1/2)m = (7/12)m`. Solving for `c` gives `c=1/6`. Then the bound given by Chernoff's inequality is at most `e^{(-c^2p m)/2} = e^{-((1/6)^2(1/2)m)/2} = e^(-m/144) = e^((144ln(delta/2))/144) = delta/2`. So this mean that with probability at least `1 - \delta/2` the perceptron algorithm will have found an `epsilon`-accurate hypothesis. With probability `1-\delta/2`, using `(2m)/epsilon` examples will ensure that `m` updates occur. Thus, with probability `1-delta < 1 - delta + delta^2`, using `(2m)/epsilon` examples will ensure `m` updates occur and that the result of these updates is `epsilon`-accurate. Q.E.D.
pythonat the command prompt. You should see something like:
Python 3.9.6 (default, Jun 29 2021, 05:25:02) [Clang 12.0.5 (clang-1205.0.22.9)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
>>>print("hello world")which would print hello world to the terminal.
Which of the following is true?
a = "Hello" b = 'Good "bye"' c = """ Triple quotes one can go over multiple lines """ d = ''' this one works as well '''
a = "Hello World" b = a[4] # b is 'o' c = a[:5] # c is 'Hello' d = a[6:] # d is 'World' e = a[3:8] # e is "lo Wo"
my_list = ['YO', 1.2, 7, "HI", ["scary nested list", "watch out!"]] my_empty_list = [] # or list()
b = my_list[0] # b is now: 'YO' c = my_list[4][1] # c is now: 'watch out!' d = my_list[1:3] # d is [1.2, 7] e = my_list[2:] # e is [7, 'HI', ['scary nested list', 'watch out!']]
my_list.append("an end") #my_list now #['YO', 1.2, 7, 'HI', ['scary nested list', 'watch out!'], 'an end'] my_list.insert(2, 3) # my_list now #['YO', 1.2, 3, 7, 'HI', ['scary nested list', 'watch out!'], 'an end'] a = [1, 2, 3] + [4, 5] #a is [1, 2, 3, 4, 5]
import sys if(len(sys.argv)) != 2: #notice sys.argv is a list of command-line args and we found its length print("Please supply a filename") raise SystemExit(1) # throw an error and exit f = open(sys.argv[1]) #program name is argv[0] lines = f.readlines() # reads all lines into list one go f.close() #convert inputs to list of ints ivalues = [int(line) for line in lines] # print min and max print("The min is ", min(ivalues)) print("The max is", max(ivalues))
import glob path = './*' files = glob.glob(path) for name in files: print(name)
a = ( 1, "hello", 3) b = ( some, where) c = "6 scared of 7", "as 7 8 9" #notice can omit paren's d = () # 0-tuple e = 'yo', #one tuple f = ('yo',) #same one tuple g = (d,) # g is ((),)
c = (4, 5) a, b = c
my_set = set([3, 9, 2, 6]) another_set = set("goodness") # set of unique chars print(another_set) # set(['e', 'd', 'g', 'o', 'n', 's']) if 'e' in another_set: print("it's in there")
a = my_set; b = another_set; c = a | b # union of sets c = a & b # intersection of sets c = a - b # difference of sets c = a ^ b # symmetric difference of sets another_set.add('y') # adds a single element to set another_set.update([6,7,8]) # add multiple elements my_set.remove(3) # removes the number 3 from my_set
person = { "name" : "bob", "age" : 27, "sex" : "Male" } empty_dict = {} #an empty dictionary # or use dict()
name= person["name"] person["age"] = 28 person["address"] = "somewhere" #this would add a key-value pair del person["age"] # removes key value associated with 'age'
if "name" in person: name = person["name"] else: name = "no one" #the above conditional can be shortened to: name = person.get("name", "no one")
keys = list(person) #keys is ['address', 'name', 'sex'] #could also do person.keys() #person.values() would get list of values #person.len() gives the number of keys in dictionary
while condition: statement1 statement2 ...
for n in [1,2,3,4,5,6,7,8,9]: print("2 to the %d is %d" % (n, 2**n))
for n in range(1,10): print("2 to the %d is %d" % (n, 2**n)) # same as before
a = range(5) # can omit start to get a = 0, 1, 2, 3, 4 b = range(1,8) # b = 1, 2, 3, 4, 5, 6, 7 c = range(0, 13, 2) # c = 0, 2, 4, 6, 8, 10, 12 d = range(7, 2, -1) # d = 7, 6, 5, 4, 3
a = "Get rich quick" for b in a: print(b) c = ["now", "I", "know"] for d in c: print(c) person = { "name" : "bob", "age" : 27, "sex" : "Male" } for key in person: print(key, person[key]) f = open("my_file.txt") for line in f print(line)