Chris Pollett> CS267
( Print View )

Student Corner:
[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

CS267 Fall 2023Practice Final

To study for the final I would suggest you:

  1. Know how to do (by heart) all the practice problems.
  2. Go over your notes at least three times. Second and third time try to see how much you can remember from the first time.
  3. Go over the homework problems.
  4. Try to create your own problems similar to the ones I have given and solve them.
  5. Skim the relevant sections from the book.
  6. If you want to study in groups, at this point you are ready to quiz each other.

Here are some facts about the actual final:

  1. It is comprehensive.
  2. It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test.
  3. You should bring photo ID.
  4. There will be more than one version of the test. Each version will be of comparable difficulty.
  5. It is 10 problems (3pts each), 6 problems will be on materials since the second midterm, 4 problems will be from the topics of the midterm.
  6. Two problems will be exactly (less typos) off of the practice final, and one will be off of the practice midterm.

The practice final is below:

  1. Briefly give the API operations expected of an inverted index dictionary that supports indexing, queries, including prefix queries. Briefly describe the insert-at-back and move-to-front heuristic heuristics for dictionary term collisions.
  2. Describe the merge-based index construction algorithm (disk-based). How is it modified if logarithmic merging is done? Give an example.
  3. What is a generalized concordance list? Using the operators we had for such lists, express the query, "Return all intervals that begin with a <problem> and end with a </problem>" that contain the term "statement". Pick one of the GC-operators and explain how it could be implemented.
  4. What is Gap Compression? What is LLRUN? What is a Golomb code? Make sure to go into details. Give an example for each explanation.
  5. Suppose we have `m=4` many accumulators. Give examples that would trigger the QUIT and the CONTINUE accumulator pruning heuristics if term at a time processing was being used.
  6. Suppose there is a 500,000 word corpus. Document `d` is 250 words long. The terms grand and canyon occur 2000, 400 times respectively. In document `d` grad occurs thrice and canyon once. Let `\mu=7500`. What is the LMD score for `d` for the query "crand canyon"?
  7. Derive how the factor `P_2` is estimated in the DFR formula.
  8. Explain how hub and authority scores are calculated in HITS.
  9. Describe where/in what steps a distributed file system might be used in the computation of Page rank as a map reduce job.
  10. Briefly define size specific P@k and aggregate P@k and explain why they might be used.