CS267 Spring 2021 Lecture Notes

Topics in Database Systems

Videos of lectures are available.

Below are my lecture notes for the class so far. They should serve as a rough guide to what was covered on any given day. Frequently, however, I say more in class than is in these notes. Also, I tend to dynamically correct typos on the board that might appear in these lecture notes. So caveat emptor.

Week 1: [Jan 27 - Introduction to Information Retrieval]

Week 2: [Feb 1 - Text Formats, Tokenization, Term Distributions, Language Models] [Feb 3 - Language Modeling, Test Collections, Open-Source IR Systems, Inverted Indexes]

Week 3: [Feb 8 - Learning to Crawl] [Feb 8 - PHP]

Week 4: [Feb 15 - Finish PHP; Start Inverted Index ADT Implementation] [Feb 17 - Galloping/Exponential Search, Document-Oriented Indexes]

Week 5: [Feb 22 - VSM, Proximity Ranking] [Feb 24 - Finish Proximity Ranking, Boolean Retrieval]

Week 6: [Mar 1 - Evaluating Results, Token and Term Processing] [Mar 3 - Text Preprocessing and More PHP]

Week 7: [Mar 8 - PHP Autoloading - Yioop as an IR Library] [Mar 10 - Char-gramming, Language Processing]

Week 8: [Mar 15 - Static Inverted Indexes] [Mar 17 - Index Construction]

Week 9: [Mar 22 - Practice Midterm Review] [Mar 22 - Midterm]

Week 10: [Mar 29 - Spring Break] [Mar 31 - Spring Break]

Week 11: [Apr 5 - Query Processing] [Apr 7 - Accumulator Pruning, Concordance Lists]

Week 12: [Apr 12 - Finish GC-Lists, trec_eval, Start Index Compression] [Apr 14 - Huffman Coding]

Week 13: [Apr 19 - Arithmetic Coding, Gap Compression] [Apr 21 - Byte Aligned Codes, Dynamic Inverted Indexes]

Week 14: [Apr 26 - Ranking using Language Models] [Apr 28 - Divergence-from-randomness, Parallel Information Retrieval]

Week 15: [May 3 - More Parallel Information Retrieval] [May 5 - Document Quality Measures]

Week 16: [May 10 - Doc Quality, Page Rank via Map Reduce, Hadoop] [May 12 - More Categorization and Filtering]