Introduction

We have been talking about single perceptron and single layer neural nets.
These are simple enough that we can prove concrete mathematical results about them and serve as a warm-up in our understanding before we look at more sophisticated networks.
We looked at two trainings rules for pa erceptron (a) the perceptron rule and (b) Winnow.
We showed that if you train a perceptron starting from all 0 weights using the perceptron rule on data from a boolean threshold 1 perceptron, that the learning will converge on a final set of weight after `I(T, nu) = \lfloor(||vec{w}^t||^2)/(nu c_t) rfloor` change steps.
We stated some results concerning PAC-learning and perceptrons: Using Winnow or the Perceptron rule one can learn half-spaces under a uniform boolean distribution for data.
We said the Perceptron rule cannot PAC-Learn linear threshold functions under uniform boolean distributions (converges too slow), but can learn them if we use data from a spherical uniform distribution.
We said it is known that Winnow converges faster than the perceptron rule when learning disjunctions under a uniform setting (like HW).
Disjunctions are a special type of nested boolean functions which we in turn proved can be represented by gradual boolean threshold functions.
We begin today by sketching the proof that gradual threshold function can be PAC-learned by perceptrons under the uniform boolean distribution.

PAC-Learning of Gradual Threshold Functions

Theorem. If `C` is a gradual class of boolean threshold functions, then the perceptron rule is a PAC Learning algorithm for `C` under the uniform distribution on `{0, 1}^n`.

Proof. Let `vec{w} \cdot vec{x} \ge theta` be an `n`-bit linear threshold function from the class `C`. Without loss of generality, we can take `w,theta` to be normalized, so that `|vec{w}\cdot vec{x} -theta|` is the distance of the point `x` to the hyperplane. By the definition of gradual, there is some constant `k` such that for all `tau>0`, the odds that a uniformly chosen element of `{0,1}^n` is within distance `tau` of the hyperplane `vec{w}\cdot vec{x} = theta` is at most `tau/(2k)`. If we set `tau = k epsilon`, then with probability at most `epsilon/2`, a random example drawn from `{0,1}^n` is within `k epsilon` of the hyperplane. From this, if we let `B \subseteq {0,1}^n` be the examples `x` which lie within `k epsilon` of the hyperplane, then `Pr[x in B] le epsilon/2`.

Let `(w_t, theta_t)` be the perceptron algorithm's hypothesis after `t` updates have been made. If `epsilon ge 1`, the definition of PAC-learnability is trivially satisfied, so assume `epsilon < 1`. Also, from the definition of gradual, if a collection of hyperplanes is gradual with constant `c` then it will be gradual for constant `c' > c`, So we can assume `k` above is at least 1. Suppose `(w_t, theta_t)` is not yet `epsilon`-accurate, then with probability at most `epsilon/2 < 1/2`, the next example which causes an update will be in `B`. Define the potential function
`N_t(alpha) = ||alpha w - w_t||^2 + (alpha theta - theta_t)^2`.

The perceptron update rule tells us `vec{w}_{t+1} = \vec{w}_t \pm x` and `\theta_{t+1} = \theta_{t} bar{+} 1`, so `N_{t+1}(alpha) - N_t(alpha)` is
\begin{eqnarray*} \Delta N(\alpha) &=& ||\alpha \vec{w} - \vec{w_{t+1}}||^2 + (\alpha \theta - \theta_{t+1})^2\\ &&\quad - ||\alpha \vec{w} - \vec{w_t}||^2 - (\alpha \theta - \theta_t)^2\\ &=&\mp 2\alpha \vec{w} \cdot \vec{x} \pm 2\alpha\theta \pm 2\vec{w_t}\cdot \vec{x} \mp 2 \theta_t + ||x||^2 +1\\ &\leq& 2 \alpha A \pm 2 (\vec{w}_t \cdot \vec{x} - \theta_t) + n+1. \end{eqnarray*} with `A = bar{+} (vec{w}\cdot vec{x} - theta)`. We are again using above that `||x||^2 le n`.

Proof of PAC-Learning cont'd

Since we are assuming `vec{x}` was misclassified, we know `\pm(vec{w_t} \cdot vec{x} - theta_t) < 0`, so `\Delta N(\alpha) < 2alpha A + n +1`. If `x in B` then `A \le 0`; if `x !in B`, then `A leq -k epsilon`. So `\Delta N(\alpha) < n + 1` for `x in B` and `\Delta N(\alpha) < n + 1 - 2k \epsilon\alpha` for `x !in B`. Suppose the perceptron algorithm has made `r` updates with examples in `B`, and `s` updates for examples outside `B`. Since `(vec{w}, theta)` was normalized, `|theta| \leq sqrt(n)`. Recall at the start of the perceptron algorithm the initial weights are all `0`. Hence, `N_0(\alpha) \leq alpha^2(n+1)`. Since for all `t`, `N_t(\alpha) ge 0`, it follows that
`0 le r(n+1) + s(n+1 - 2k\epsilon\alpha) + alpha^2(n+1)`.
Setting `alpha = (12(n+1))/(5 k epsilon)`, the above simplifies to
`0 \leq r - 19/5s + (144(n+1)^2)/(25(k epsilon)^2)`.
If `m_1 = r + s = (144(n+1)^2)/(25(k epsilon)^2)` updates have been made, then `r = m_1 - s`, and if we substitute this into the above inequality, we get
\begin{eqnarray*} 0 &\leq& m_1 - s - \frac{19}{5}s + m_1\\ 0 &\leq& 2m_1 - \frac{24}{5} s\\ \frac{24}{5}s &\leq & 2m_1\\ s &\leq& \frac{10}{24} m_1\\ s &\leq & \frac{5}{12}m_1. \end{eqnarray*} So at least `7/12` fraction of the updates must have been made on examples in `B`.

Proof of PAC-Learning Conclusion

If the perceptron's hypothesis has never been `\epsilon`-accurate, then from our discussion at the start of the proof, at each update, the probability of that update occurring on a point in `B` is at most `1/2`. So by Chernoff Bounds, the probability that more than `7/12`ths of `m = max(-144ln(\delta/2), m_1)` updates occur in `B` is at most `delta/2`. I. e., Chernoff Bound's says the odds we see `(7/12)m` trials in `B` when we should expect only `(1/2)m` trials in `B` is governed by `p=1/2`, `c` and `m` where `(1+c)(1/2)m = (7/12)m`. Solving for `c` gives `c=1/6`. Then the bound given by Chernoff's inequality is at most `e^{(-c^2p m)/2} = e^{-((1/6)^2(1/2)m)/2} = e^(-m/144) = e^((144ln(delta/2))/144) = delta/2`. So this mean that with probability at least `1 - \delta/2` the perceptron algorithm will have found an `epsilon`-accurate hypothesis. With probability `1-\delta/2`, using `(2m)/epsilon` examples will ensure that `m` updates occur. Thus, with probability `1-delta < 1 - delta + delta^2`, using `(2m)/epsilon` examples will ensure `m` updates occur and that the result of these updates is `epsilon`-accurate. Q.E.D.

Getting Started with Python

As we will be using Python to do the homeworks, I'd like to give a quick intro/refresher on how to code in Python.
Python can be obtained from:
http://python.org/download/.
Python 2.7.x is available by default on Mac's running Macos Big Sur, however, I will tend to use Python 3 (3.9.x, for what we do it won't make much difference).
Some differences between Version 2 and 3 are Unicode support in strings, exception chaining, annotations, etc -- none of which we'll really make use this semester.

Running Python

Assuming you have set up your path environment, you can launch python in interactive mode by just typing:

python

at the command prompt. You should see something like:

Python 3.9.6 (default, Jun 29 2021, 05:25:02) 
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Typing quit() on a line or hitting CTRL-D returns you to the command prompt.
As an example, we could type:
```
>>>print("hello world")
```
which would print hello world to the terminal.

Quiz

Which of the following is true?

Perceptrons with logistic activation functions are tensors.
The perceptron convergence theorem shows perceptrons can learn an arbitrary function.
The learning update rule used for the perceptron convergence theorem and our PAC learning of nested boolean functions was the same.

Strings

String literals can be enclosed in either single, double, or triple quotes:

a = "Hello"
b = 'Good "bye"'
c = """ Triple quotes one can
go over multiple lines
"""
d = ''' this one
works as well
'''

It is relatively easy to extract characters out of a string:

a = "Hello World"
b = a[4] # b is 'o'
c = a[:5] # c is 'Hello'
d = a[6:] # d is 'World'
e = a[3:8] # e is "lo Wo"

Strings are concatenated using + .
A string can be converted to a integer using int() and a float using float(). For example, int("79")
The command str (for base types) and repr (for objects) can be used to convert other types into strings. For example, str(3.4).
String's support a variety of useful methods such as split(sep) - split into list according to a separator, upper() - convert to upper case, lower() -- convert to lower case, isalpha - check if all the characters are alpha, isdigit -- check if all the characters are digits, etc.
In addition, if you import the re module, you get several useful regular expression functions such as escape, match, search, split, sub.

Lists

Lists can be declared using square brackets like:

my_list = ['YO', 1.2, 7, "HI", ["scary nested list", "watch out!"]]
my_empty_list = [] # or list()

We can access parts of a list in a variety of ways:

b = my_list[0] # b is now: 'YO'
c = my_list[4][1] # c is now: 'watch out!'
d = my_list[1:3] # d is [1.2, 7]
e = my_list[2:] # e is [7, 'HI', ['scary nested list', 'watch out!']]

Lists can also be easily manipulated to make new lists:

my_list.append("an end") #my_list now 
    #['YO', 1.2, 7, 'HI', ['scary nested list', 'watch out!'], 'an end']
my_list.insert(2, 3) # my_list now
    #['YO', 1.2, 3, 7, 'HI', ['scary nested list', 'watch out!'], 'an end']
a = [1, 2, 3] + [4, 5] #a is [1, 2, 3, 4, 5]

Example Using Lists and Command Line

import sys
if(len(sys.argv)) != 2: 
   #notice sys.argv is a list of command-line args and we found its length
   print("Please supply a filename")
   raise SystemExit(1) # throw an error and exit
f = open(sys.argv[1]) #program name is argv[0]
lines = f.readlines() # reads all lines into list one go
f.close()

#convert inputs to list of ints
ivalues = [int(line) for line in lines]

# print min and max
print("The min is ", min(ivalues))
print("The max is", max(ivalues))

Example Files in a Folder as a List

import glob
path = './*'
files = glob.glob(path) 
for name in files:
   print(name)

Tuples

You can pack a collection of objects into a single object in Python using a tuple

a = ( 1, "hello", 3)
b = ( some, where)
c = "6 scared of 7", "as 7 8 9" #notice can omit paren's
d = () # 0-tuple
e = 'yo', #one tuple 
f = ('yo',) #same one tuple
g = (d,) # g is ((),)

Can access parts of tuple with the same notation as lists a[1] gives 'hello' for instance. Or we can do things like:
```
c = (4, 5)
a, b = c
```
Tuples cannot be appended or inserted to, but save memory. Tuples are immutable.

Sets

A set is used to contain an unordered collection of objects.

Sets can be created, viewed, and checked for membership using statements like:

my_set = set([3, 9, 2, 6])
another_set = set("goodness") # set of unique chars
print(another_set) # set(['e', 'd', 'g', 'o', 'n', 's'])
if 'e' in another_set:
    print("it's in there")

Sets support various operations:

a = my_set; b = another_set;
c = a | b # union of sets
c = a & b # intersection of sets
c = a - b # difference of sets
c = a ^ b # symmetric difference of sets
another_set.add('y') # adds a single element to set
another_set.update([6,7,8]) # add multiple elements
my_set.remove(3) # removes the number 3 from my_set

A frozenset is like a set but cannot be changed

Dictionaries

Python's notion of an associative array (a table of key-value pairs) is called a dictionary.

Dictionary literal make use of curly braces:

person = {
    "name" : "bob",
    "age" : 27,
    "sex" : "Male"
}
empty_dict = {} #an empty dictionary # or use dict()

We can access and modify elements of a dictionary with statements like:

name= person["name"]
person["age"] = 28
person["address"] = "somewhere" #this would add a key-value pair
del person["age"] # removes key value associated with 'age'

Membership of a key in a dictionary can be tested with 'in':

if "name" in person:
   name = person["name"]
else:
   name = "no one"
#the above conditional can be shortened to:
name = person.get("name", "no one")

Aside: Python has built in functions locals() and globals() and the syntax: x in locals() or x in globals() might be used to check if a variable is set.

The list of keys in a dictionary can be gotten with:

keys = list(person) #keys is ['address', 'name',  'sex'] 
#could also do person.keys() 
#person.values() would get list of values
#person.len() gives the number of keys in dictionary

Iteration and Looping

Python supports while loops like most languages:

while condition:
    statement1
    statement2
    ...

Python for loops always behave as a foreach over a sequence:

for n in [1,2,3,4,5,6,7,8,9]:
    print("2 to the %d  is %d" % (n, 2**n))

To make this less cumbersome, Python allows you to declare ranges such as range(1, 10):
```
for n in range(1,10):
    print("2 to the %d  is %d" % (n, 2**n)) # same as before
```

The syntax for ranges is: range(start_inclusive, end_exclusive [, stride]):

a = range(5) # can omit start to get a = 0, 1, 2, 3, 4
b = range(1,8) # b = 1, 2, 3, 4, 5, 6, 7
c = range(0, 13, 2) # c = 0, 2, 4, 6, 8, 10, 12
d = range(7, 2, -1) # d = 7, 6, 5, 4, 3

In Python 2, range actually creates a list which can be quite memory intensive for big ranges. If you didn't want a list but just something to iterate over you used xrange. In Python 3, xrange is no more and range() have the default behavior of xrange.

Examples of things can iterate over with for

As we've seen in some of our earlier examples, for can be used with strings, lists, dictionaries, file objects, etc:

a = "Get rich quick"
for b in a:
    print(b)

c = ["now", "I", "know"]
for d in c:
    print(c)

person = {
    "name" : "bob",
    "age" : 27,
    "sex" : "Male"
}
for key in person:
    print(key, person[key])

f = open("my_file.txt")
for line in f
    print(line)

PAC-Learning Gradual Thresholds - Python

Outline