CS267 Spring 2021 Practice Midterm
For the midterm I would suggest you:
- Know how to do (by heart) all the practice problems.
-
Go over your notes at least three times. Second and third time try to see how much you can remember from the first time.
-
Go over the homework problems.
-
Try to create your own problems similar to the ones I have given and solve them.
-
If you want to study in groups, at this point you are ready to quiz each other.
The practice midterm is below. Here are some facts about the actual midterm:
- It is open book, open notes, open internet, however, you are not allowed to discuss problems on the midterm with classmates or other people until after the midterm submission period is over.
- It will be posted by the start of class time on the midterm day as one of the links on the side of the class homepage.
- You submit the midterm for grading using the same mechanism as to submit homeworks.
- It is designed to take an 1h15m, however, you can submit it up until midnight on the day it is posted.
- Some problems will be dependent on some personal information about you such as your ID, and you will not receive credit if you don't make use of this information correctly.
- One problem will be off the practice midterm.
Practice Problems
On practice midterm day, please post your problem solutions to the Practice Midterm Thread.
- Define the following IR terms and give an example: (a) the Probability Ranking Principle, (b) specificity, (c) exhaustivity, and (d) novelty.
- Pick your favorite (family-friendly) limerick. Give the probability of each term in this limerick according to a first order language model. Compute the MLE of the
second line.
- Consider the "As I was going to St Ives ..." riddle, explain how the nextPhrase algorithm would work on inputs: nextPhrase("had", "seven", 10). As there are variants to the riddle, tell me which one you are using as you go through the solution.
- Suggest a data structure that could be used to implement an inverted index in PHP, and give a working code snippet to show how to implement first($t), last($t) from our inverted index ADT.
- Give pseudo-code for an galloping search implementation of prev(t, pos), where prev is from out inverted index ADT.
- Define the following terms (1pt each): (a) docid index, (b) frequency index, (c) positional index, and (d) schema-independent index.
- In the following two documents fill in the blank with your home town and your favorite holiday respectively. (1) When I was little, I don't recall space aliens visiting ____. (2) Maybe the space aliens were waiting for ____. Assume we are using the vector space model with TF-IDF scores for the components. Compute the TF-IDF vectors for each document. Compute their cosine similarity.
- Give the nextSolution algorithm from class for positive boolean queries. Show how it would work on a concrete corpus and query.
- Suppose `n=3` what would be the character `n`-grams for the word toast? (1pt) What is stopping? (1pt) What is stemming (1pt)? Give an example of the kinds of rules used by a Porter stemmer.
- Briefly explain and give example of the following (1pt each): (a) sort-based dictionary, (b) per-term index, (c) move-to-front heuristic, (d) merge based index construction.
|