Student Corner:
[Submit Sec2]

[Lecture Notes]
[Discussion Board]

Course Info:
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

Due date: Nov 18

Files to be submitted:
Hw4.zip

Purpose: To gain experience in programming in Javascript.

Related Course Outcomes:

CLO4 -- Write client-side scripts that validate HTML forms.

CLO5 -- Develop and deploy web applications that involve components, web services, and database

Specification:

Once upon a time, there was a CS 174 professor who was walking around the block trying not to step in dog poop and pondering deep questions of the universe. He was thinking it is not too hard to determine the most common words in a language by just counting words in some corpus for that language such as a Wikipedia dump. On the other hand, it seems harder to track down for each word in a language what fraction of native speakers know that word? Mmmm, he thought, this seems like a job for those intrepid CS 174 coders and the internet to solve. Students can make a command line tool QuizMaker.php to make quizzes based on a folders of text files, and make a web app where the denizens of the web can take theses quizzes and have their scores tallied.

The command line tool will be run with a syntax like:

php QuizMaker.php


This will look in the data folder of the project and will make one quiz data file for each subfolder of data. The name of the quiz should be the name of the subfolder followed by .txt . So an english subfolder would get an english.txt quiz data file. A quiz data file should be a serialized, associative array consisting of word => [number of occurrences of word in documents in folder, [array of 5-grams word appeared in as the middle word]] pairs. To generate a quiz data file, QuizMaker.php should read each .txt file in a subfolder of data. For each such file, it read in the file to a string, strip any HTML/XML tags, and lower case this string. Then it should split the file into sentences according to a list of common sentence terminating symbols (for example, .!?). Any remaining punctuation should next be removed from a sentence. For each word in each sentence in the file, the number of occurrences of the word should be added to the appropriate entry in the associative array. Further, each five word-gram in the sentence should be added to the appropriate entry in the array. For example, if the sentence was: "The quick brown fox jumped over the lazy dog." The word "the" is the middle word for two 5 word-grams: ([blank] [blank] the quick brown) and (jumped over the lazy dog). We would add each to the entry for the word "the" in the associative array. Once done processing all the text files in a folder, the associative array should be sorted from most frequently occurring word to least before serializing and writing it to the quiz file.

The web app consists of three types pages:

Landing Page

# Language Quiz/Name of Quiz

Select the words that could be used to fill in the blank (at least one should work).

1. jumped over ___ lazy dog
...
1. dogs dont ___ people with
Quiz Page

# Language Quiz/Name of Quiz/Results

Word Rank Percentile% Correct
5%99
5%-10%96
......
95% - 100%20
Quiz Results Page

Here are the requirements on your project:

1. Your homework should be developed in PHP as a composer project (server-side part), so should have a composer.json file.
2. Your project is written using namespaces. You only create variables, arrays, objects, define new classes, etc. in the namespace cool_name_for_your_group\hw4 and subnamespaces thereof.
3. The folder structure for your project should be the same as for HW3 and you should use the MVA design pattern for the web app part. As this project does not involve a database, you will not need a CreateDB.php file. As you will be using packagist.org you need a vendor folder. You should also have a executables folder which will hold your command-line QuizMaker.php. Finally, you should have a folder data which will be used to store corpuses for quizzes along with word quizzes.
4. You should develop your whole project using git. If the grader does a diff between any two adjacent commits in the git log history, the number of lines of code that change should never be more than 100 lines.
5. You should have a file issues.txt where you split the project into issues. Each issue should have a number by it and an initial description. If you are working in a group of more than one person, the issues should be assigned to team members. Beneath the initial description, should be bullet points for any discussion comments between team members (or between yourself if you are in a group one).
6. Your program should use monolog/monolog to write a log message after QuizMaker.php processes each file in a folder.
7. Quizmaker.php should generate quite data files as described above.
8. The landing page should make use of Javascript to check when someone clicks on Start Quiz that both a quiz and a number of years of experience have been selected. If no, a message should be displayed, if yes, the user should be taken to the appropriate quiz.
9. The landing page should make use of Javascript to check when someone clicks on see Results, that at least a Quiz has been selected before taking the user to the appropriate quiz results. If not, a message should be displayed.
10. The controller for a quiz page, should call a QuizModel class' getQuizData method to read in the appropriate quiz info file.
11. A quiz should consist of 20 multiple choice questions. To generate a question the controller should uniformly at random select a word from the quiz info data array keys. Since these keys are sorted by their rank (most common word rank 0, next rank 1, ...), it should determine the percentile range of the rank of the word to the nearest 5%. I.e., Is the word in the top 5% of most frequent words? Between top 5% and top 10%? etc. This should be included with the quiz problem as a hidden variable. Next your program should uniformly at random choose a 5 word gram from the list of 5 word grams you have for that word, a five word gram to use for the quiz problem. Finally, it should choose uniformly at random from the set of all words discovered three other words.
12. When a user click's submit on a quiz, you should use Javascript to check that for each problem at least one checkbox was checked.
13. To grade a quiz, your program should cycle over the quiz answers for each quiz problem. It should check for each checked box, the word that was checked did appear as a middle word of a 5 gram with the other 4 words that were given in the quiz problem. If the box was not checked, your program should check it didn't appear as the middle word. If each of the four boxes was checked correctly, the problem is deemed correctly answered.
14. Your program should maintain a file in the data folder QuizStatistics.txt, it should consists of serialized array data for four subarrays for the different amount of language experience a user may have:
["any" => [data for any],
"less10" => [data for less10],
"10-20" => [data for 10-20],
"20" => [data for >20]]

The data for each of these categories should be an associative array for each of the 5% ranges:
[0] => [num_correct, num_answered],