Chris Pollett> Old Classses >
CS267

( Print View )

Student Corner:
[Final-PDF]

[Submit Sec1]
[Grades Sec1]

[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

CS267 Spring 2021 Sec1 Home Page/Syllabus

Topics in Database Systems

Instructor: Chris Pollett
Office: MH 214
Phone Number: (408) 924 5145
Email: chris@pollett.org
Office Hours: MW 4:30-5:30pm
(Via Zoom Meeting)
Office Hours Zoom Meeting ID: 986 6035 4618
Class Meets:
Sec1 MW 1:30-2:45pm online
(Via Zoom Meeting )
Class Zoom Meeting ID: 957 8679 2830
Meeting Password: 985049

Prerequisites

To take this class you must have taken:
CS157B
with a grade of C- or better.

Texts and Links

Required Texts: Information Retrieval: Implementing and Evaluating Search Engines. Buttcher, Clarke, and Cormack
Online References and Other Links: Yioop! Open Source Search Engine.
Nutch.
Wumpus.
Heritrix.

Description

From the catalog: Advanced topics in the area of database and information systems. Content differs in each offering. Possible topics include though not restricted to: Data Mining, Distributed Databases and Transaction Processing. For this section, we will study information retrieval systems. Information Retrieval is the study of how to represent, search, and manipulate large collections of text and human data. Modern search engines such as Google, Bing, Baidu, Yandex are probably the most familiar examples of IR systems. Other examples are digital libraries (Melvyl), e-mail, and technical report systems, plagiarism systems such as turnitin.com, and even desktop search systems. Such systems are databases; however, the typical implementations of their building blocks such as indices, ordering result sets, and so on differs from conventional databases. The focus of this class is on implementation techniques for information retrieval systems, and also on measuring how effective the results returned from such systems are.

Course Learning Outcomes (CLOs)

By the end of this course, a student should be able to:

CLO1 -- Code a basic inverted index capable of performing conjunctive queries.

CLO2 -- Be able to calculate by hand on small examples precision (fraction relevant results returned), recall (fraction of results which are relevant), and other IR statistics.

CLO3 -- Be able to explain where BM25, BM25F and divergence from randomness statistics come from.

CLO4 -- Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes.

CLO5 -- Demonstrate with small examples how incremental index updates can be done with log merging.

CLO6 -- Be able to evaluate search results by hand and using TREC eval software.

CLO7 -- Know at least one Map Reduce algorithm (for example to calculate page rank).

Course Schedule

Below is a tentative time table for when we'll do things this quarter:

Week 1:Jan 25, Jan 27 (First Day) Read Ch 1.1, 1.2 Introduction to IR
Week 2:Feb 1, Feb 3 Finish Ch 1
Week 3:Feb 8, Feb 10 Read Ch 2.1-2.2, Phrase search, inverted indexes, VSM
Week 4:Feb 15, Feb 17 Finish Ch 2 Recall and precision
Week 5:Feb 22(Hw1), Feb 24 Read Ch 3 Stemming, stopping, and n-grams, will supplement with material on how to crawl
Week 6:Mar 1, Mar 3 Read Ch 4. Parts of inverted indexes and construction of them
Week 7:Mar 8, Mar 10 Finish Ch 4
Week 8:Mar 15, Mar 17(Hw2) Read Ch 5. Query processing techniques
Week 9:Mar 22, Mar 24(Midterm)(Midterm) Review
Week 10:Mar 29, Mar 31 Spring Break
Week 11:Apr 5, Apr 7 Ch 6,Ch 7.1, 7.2 Index compression, Incremental index update
Week 12:Apr 12, Apr 14 Ch 9 Ranking functions LMJM, LMD, pseudo-relevance feedback, DFR
Week 13:Apr 19, Apr 21 Read Ch 14. Map reduce algorithms
Week 14:Apr 26, Apr 28 Ch 15 Document Quality Measures, Web Search
Week 15:May 3, May 5 Ch 10.1 , 10.2 Survey Categorization and Filtering
Week 16:May 10, May 12 Vertical Search Engines
Week 17:May 17(Hw5) Review
The final will be Friday, May 21 from 12:15pm to 2:30pm PST, the department server will continue to accept your submission till May 22, 11:59pm.

Grading

HWs and Quizzes 50%
Midterm 20%
Final 30%
Total100%

Grades will be calculated in the following manner: The person or persons with the highest aggregate score will receive an A+. A score of 55 will be the cut-off for a B-. The region between this high and low score will be divided into five equal-sized regions. From the top region to the low region, a score falling within a region receives the grade: A, A-, B+, B, B-. If the boundary between an A and an A- is 85, then the score 85 counts as an A-. Scores below 55 but above 50 receive the grade D. Those below 50 receive the grade F.

If you do better than an A- in this class and want me to write you a letter of recommendation, I will generally be willing provided you ask me within two years of taking my course. Be advised that I write better letters if I know you to some degree.

Course Requirements, Homework, Quiz Info, and In-class exercises

This semester we will have five homeworks, weekly quizzes, and weekly in-class exercises.

Every Monday this semester, except the first day of class, the Midterm Review Day, and holidays, there will be a quiz on the previous week's material. The answer to the quiz will either be multiple choice, true-false, or a simple numeric answer that does not require a calculator. Each quiz is worth a maximum of 1pt with no partial credit being given. Out of the total of twelve quizzes this semester, I will keep your ten best scores.

On Wednesday's, we will spend 15-20 minutes of class on an in-class exercise. You will be asked to post your solution to these exercises to the class discussion board. Doing so is worth 1 "insurance point/pre-point" towards your grade. An "insurance point/pre-point" can be used to get one missed point back on a midterm or final, up to half of that test's total score. For example, if you scored 0 on the midterm and have 10 insurance points, you can use your insurance points, so that your midterm score is a 10. On the other hand, if you score 18/20 on the midterm, you can use at most 1 insurance point since half of what you missed (2pts) on the midterm is 1pt.

Links to the current list of homeworks and quizzes can be found on the left hand side of the class homepage. After an assignment has been returned, a link to its solution (based on the best student solutions) will be placed off the assignment page. Material from assignments may appear on midterms and finals. For homeworks you are encouraged to work in groups of up to three people. Only one person out of this group needs to submit the homework assignment; however, the members of the group need to be clearly identified in all submitted files.

Homeworks for this class will be submitted and returned completely electronically. To submit an assignment click on the submit homework link for your section on the left hand side of the homepage and filling out the on-line form. Hardcopies or e-mail versions of your assignments will be rejected and not receive credit. Homeworks will always be due by midnight according to the departmental web server on the day their due. Late homeworks will not be accepted and missed quizzes cannot be made up; however, your lowest score amongst the five homeworks and your quiz total will be dropped.

When doing the programming part of an assignment please make sure to adhere to the specification given as closely as possible. Names of files should be as given, etc. Failure to follow the specification may result in your homework not being graded and you receiving a zero for your work.

Classroom Protocol

I will start lecturing close to the official start time for this class modulo getting tangled up in any audio/visual presentation tools I am using. Once I start lecturing, please mute yourself in Zoom unless you have a question. I like to see live people's faces, so if you have the bandwidth I prefer if people show their video, but I understand if you cannot. If something I am talking about is unclear to you, feel free to unmute yourself to ask a question about it or type it into the chat. On different occasions throughout the semester, such as for In-class Exercises and practice test days, I may or may not use break-out session. If I do, I expect people to behave as if they were being watched in public. People should keep their clothes on, etc. I will immediately refer any instances of harassment to the appropriate university channels. This class has also an online class discussion board which can be used to post questions relating to the homework and tests. Please keep discussions on this board civil. This board will be moderated. Class participation, although not a component of your grade, will be considered if you ask me to write you a letter of recommendation.

Exams

The midterm and final will be online and submitted electronically using the same mechanism as the homeworks. They are open book/internet, but you are not allowed to interact with other students or individuals or question answering entities about the test while taking it. Each test will be different for each student in this class, with problems depending on your name, id, etc. All problems will be short answer and can involve coding. The midterm will be available on: Mar 24 at the usual class time. This test should take an 1h15m, the department server, however, will keep accepting your midterms until 11:59pm that day, the official end time of the midterm, and if you took longer than 1h15m you won't be penalized. Similarly, the final will be available Friday, May 21 from 12:15pm to 2:30pm PST, the department server will continue to accept your submission till May 22, 11:59pm. My expectation is that if the final had been offered in person it should take about 2h15m.

The final will cover material from the whole semester although there will be an emphasis on material after the last midterm. No make up midterms will be given, in rare circumstance a make-up final might be given on the exam make-up day. The final exam may be scaled to replace a midterm grade if it was missed under provably legitimate circumstances. These exams will test whether or not you have mastered the material both presented in class or assigned as homework during the semester. I try to avoid making tricky problems. The week before each exam I will give out a list of problems representative of the level of difficulty of problems the student will be expected to answer on the exam.

Regrades

If you believe an error was made in the grading of your program or exam, you may request in Zoom/person a regrade from me, Professor Pollett, during my office hours. I do not accept e-mail requests for regrades. A request for a regrade must be made no more than a week after the homework or a midterm is returned. If you cannot find me before the end of the semester and you would like to request a regrade of your final, you may see me in Zoom/person at the start of the immediately following semester.

University Policies and Procedures

Per University Policy S16-9, university-wide policy information relevant to all courses, such as student class time requirements expectations, academic integrity, accommodations, etc. will be available on Office of Graduate and Undergraduate Programs' Syllabus Information web page at http://www.sjsu.edu/gup/syllabusinfo/. Below are some brief comments on some of these policies as they pertain to this class.

Academic Integrity

For this class, you should obviously not cheat on tests. For homeworks, you should not discuss or share code or problem solutions between groups! At a minimum a 0 on the assignment or test will be given. A student caught using resources like Rent-a-coder will receive an F for the course. Faculty members are required to report all infractions to the Office of Student Conduct and Ethical Development.

Accommodations

If you need a classroom accommodation for this class, and have registered with the Accessible Education Center, please come see me earlier rather than later in the semester to give me a heads up on how to be of assistance.