CS267 Spring 2021Practice Final

  • Know how to do (by heart) all the practice problems.
  • Go over your notes at least three times. Second and third time try to see how much you can remember from the first time.
  • Go over the homework problems.
  • Try to create your own problems similar to the ones I have given and solve them.
  • If you want to study in groups, at this point you are ready to quiz each other.

The practice final is below. Here are some facts about the actual final:

  1. It is open book, open notes, open internet, however, you are not allowed to discuss problems on the final with classmates or other people until after the final submission period is over.
  2. The final will have 10 problems.
  3. It will be posted as one of the links on the side of the class homepage.
  4. You submit the final for grading using the same mechanism as to submit homeworks.
  5. It is designed to take an 2h15m, however, it will be posted by 12:15pm on May 21 and you can submit it up until May 22 at 11:59pm.
  6. Some problems will be dependent on some personal information about you such as your ID, and you will not receive credit if you don't make use of this information correctly.
  7. One problem will be off the practice midterm and two problems will be off the practice final.
  8. Six problems will be on material after the midterm, four will be one material before.

On practice final day, please post your problem solutions to the Practice Final Thread. Make sure to include the names of all your group's members.

  1. Prove `lim_(f_(t,d) -> infty) TF_(BM25)(t,d) = k_1 +1`. Briefly explain the Maxscore heuristic as used in document at a time query processing.
  2. Suppose we have `m=min`(5, length of your surname) many accumulators. Give examples that would trigger the QUIT and the CONTINUE accumulator pruning heuristics if term at a time processing was being used.
  3. Express the following as region algebra queries: (a) Your first name within 10 terms of your last name. (b) Your birthday in a <birthdate></birthdate> tags before the phrase "wins the lottery".
  4. Give a concrete example involving your name of how a Canonical Huffman code might be written as a preamble to encoding a string.
  5. Give an example of a parametric and of a non-parametric code suitable for GAP compression. Briefly explain and give an example of vByte and Simple-9.
  6. Explain what kind of incremental updates logarithmic merging is a suitable algorithm for. Explain how this algorithm works.
  7. Distinguish intra-query and inter-query parallelism as far as information retrieval goes. What bottlenecks exists to intra-query parallelism if partition-by-document is used.
  8. Explain one method to estimate P1 and one method to estimate P2 in the divergence-from-randomness approach to coming up with a relevance measure.
  9. Explain how hub and authority scores are calculated in HITS.
  10. What is Batch filtering? What is aggregate P@k?