Student Corner:
[Submit Sec2]

[Lecture Notes]
[Discussion Board]

Course Info:
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

Due date: Oct 11

Files to be submitted:
Hw2.zip

Purpose: To implement the S-K Algorithm for training SVMs as well as gain familiarity with Numpy and Pillow.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO3 -- Be able to explain how different neural network training algorithms work.

CLO4 -- Be able to select neural network layers type to build a network suitable for various learning tasks such as object classification, object detection, language processing, planning, policy selection, etc.

CLO7 -- Be able to measure the performance of a model, determine if more data in needed, as well as how to tune the model.

Description:

For this homework, you can use the Python libraries Numpy and Pillow, but you are not allowed to use any neural net or svm libraries other than those you code yourself. In your submitted Hw2.zip file I am going to use the files: suits_generator.py, sk_train.py, svm_model_tester.py, and Hw2.pdf to determine your grade. This homework consists of the following parts which describe what these files should contain:

1. A dataset generator which can be used to generator variations of Playing Card Suit Symbols (Not an actual playing card, just the symbol of a suit). Your submitted generator will be tested from the command line with a line of format:
python suits_generator.py folder_name num_examples

The following is an example concrete invocation of your program:
python suits_generator.py data 10000

When run your program should output num_examples many 25x25 black-and-white png image files in the folder folder_name. The files should have file names of the form Num_SuitLetter.png. For example, 1_H.png. The number 1 means it was the first example image generated. The second image would use the number 2, and so on. The letter H means the image in the file was supposed to be an H (Heart). Other single letter codes for the remaining suits are:
C - for clubs
D - for diamonds

When generating the ith training example, your program should randomly choose the symbol to write to the image file. Before writing the symbol, it should apply various transformations to it, so that every time a symbol is drawn, it is somewhat different than the last. The kinds of transformations you should randomly apply to varying degrees are: Variations in position of the symbol, variations in the orientation of the symbol, variations in the size of the symbol, variations in the thickness of the strokes in the symbol, variations in the size and number of stray marks such as ellipsoids drawn in the image. The control of the amount of these kinds of variations should be pulled out into tweakable constants at the top of the suits_generator.py file.
2. An SVM model training program. Your submitted trainer will be tested from the command line with a line using the format:
python sk_train.py epsilon max_updates class_letter model_file_name train_folder_name

The following is an example concrete invocation of your program:
python sk_train.py .01 30000 C clubs_model.txt data

If there are no files in train_folder_name folder with suits_generator.py output format file_names, then your program should stop and output NO DATA. Otherwise, your program use the files in train_folder_name and train an SVM using the kernel variant of the S-K algorithm taught in class. The training should stop after either the model has converged to within the value of epsilon or if more than max_updates adaptation steps have been done. The SVM should be trained to determine if an input image is of the class given by the class_letter. These class letters, C, D, H, S, are the same as used by suits_generator.py. The kernel used in training should be a degree 4 polynomial kernel. This is what Cortes and Vapnik used. The choice of lambda for the scaling in the scaled convex hull should satisfy the inequality from the slides so that the classes are separable. The output of your program should be a file consisting of a serialization of the class trained for (one of C, D, H, S), the centroids of the positive and negative training data, the value lambda computed, together with the final weights found. Each weight should be an ordered pair (i, alpha) where i is the index of a support vector (an image in the training set), where alpha is a float. You don't have to output any weights where alpha is 0. Notice from the file name of the image i one can determine if the example was in the trained for class or not. Then from the centroids and lambda one can compute vec{x}'_i.
3. An SVM model testing program. Your submitted testing program will be graded by running from the command line a line using the format:
python svm_model_tester.py model_file_name train_folder_name test_folder_name

The following is an example concrete invocation of your program:
python svm_model_tester.py clubs_model.txt data test

If there are no files in train_folder_name folder with suit_generator.py output format file_names, then your program should stop and output NO TRAINING DATA. If there are no files in test_folder_name folder with suits_generator.py output format file_names, then your program should stop and output NO TEST DATA. If there is no correctly formatted model_file_name it should output appropriately either CAN'T FIND MODEL FILE or MODEL FILE IS NOT OF THE CORRECT FORMAT. Otherwise, your program should loop through the images in test_folder_name. For each image, it should compute what the SVM model would output (1 - in class/ 0- not in class) and compare that to the class given by the image file name. It should then output a line in the format:
TRIAL# Correct/False Positive/False Negative\n

depending on the situation for that image. Here are some example output lines:
1 Correct
2 Correct
3 False Positive
4 Correct
5 False Negative
...

After outputting these lines for each image in the folder, your program should output a line for the fraction of correct test items, the fraction of false positives, and the fraction of false negative. For example, this might look like:
Fraction Correct: .96
Fraction False Positive: .3
Fraction False Negative: .1

4. Experiments and Write-up. I would like you to conduct four small experiments and write them up in a file Hw2.pdf which you include in your Hw2.zip file. You should keep your testing and training sets separate. Your test set can be relative small, say 1000 items. In one experiment I want you to vary just the amount of data trained on from say 1000, 5000, 10000, 15000, 20000, while training only for Clubs, and fixing everything else. What is the effect of the training set size on accuracy of the trained model? The second experiment should compare for the same amount of training data how accurate an SVM for C, D, H, S will be. Which symbols are the easiest/hardest to recognize? Why? The third experiment should try to vary the tweakable parameters at the top of suits_generator.py and see the effect of increasing distortion on the training. The last experiment should vary epsilon and see the effect of this on the accuracy of the trained models.
 Vector operations for needed for training and testing are done using Numpy operations on Numpy arrays 1pt Image Processing and generation is done using Pillow 1pt Items (1-4) above are each worth 2pts graded in fractions 0.5 8pts 10pts