Chris Pollett> Old Classses >
CS267
( Print View )

Student Corner:
[Submit Sec1]
[Grades Sec1]

[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

CS267 Fall 2019Practice Midterm 1

Studying for one of my tests does involve some memorization. I believe this is an important skill. Often people waste a lot of time and fail to remember the things they are trying to memorize. Please use a technique that has been shown to work such as the method of loci. Other memorization techniques can be found off the Wiki Page for Moonwalking with Einstein. Given this, to study for the midterm I would suggest you:

Know how to do (by heart) all the practice problems.
Go over your notes at least three times. Second and third time try to see how much you can remember from the first time.
Go over the homework problems.
Try to create your own problems similar to the ones I have given and solve them.
Skim the relevant sections from the book.
If you want to study in groups, at this point you are ready to quiz each other.

The practice midterm is below. Here are some facts about the actual midterm: (a) It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test. (b) You should bring photo ID. (c) There will be more than one version of the test. Each version will be of comparable difficulty. (d) One problem (less typos) on the actual test will be from the practice test.

Define the following terms (1pt each): (a) Probability Ranking Principle, (b) Zipf's law, (c) language model, (d) maximum likelihood estimate.
Suppose the phrase book reviews occurs 1000 times in a corpus of 900,000 bigrams, whereas, the term book appears 2000 times in our corpus. What is the 0th order bigram language model probability of book reviews? (1pt answer, 1pt work). What is the first order language model probability? (1pt answer, 1pt work).
Give the algorithm from class for phrase search of the phrase given by terms t[1], ..., t[n] after the location position, using our inverted index ADT. (4pts).
Suggest a data structure that could be used to implement an inverted index in PHP (2pts)), and give a working code snippet to show how to implement first($t) (1pt), last($t) from our inverted index ADT.
Suppose the schema independent posting list for bob looks like 1, 3, 120, 1000, 1200, 1201, 1202, 1300. Explain how next(bob, 2) would be computed using galloping search as the implementation for next. (4pts).
Define the following terms (1pt each): (a) docid index, (b) frequency index, (c) positional index, and (d) schema-independent index.
Suppose our whole corpus consists of the two sentences: (a) fox news was quick to report the story. (b) the quick brown fox jumped over the lazy dog. Assume we are using the vector space model with TF-IDF scores for the components. Compute the TF-IDF vectors for each document (1pt each). Compute their cosine similarity (2pts).
Give the Proximity Ranking algorithm discussed in class.
Suppose `n=4` what would be the character n-grams for the word salad? (1pt) What is stopping? (1pt) What is stemming (1pt)? Give an example of th kinds of rules used by a Porter stemmer.
Briefly explain and give example of the following (1pt each): (a) sort-based dictionary, (b) per-term index, (c) move-to-front heuristic, (d) merge based index construction.