Chris Pollett > Old Classes >
CS267

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [Class Sign Up Sec1]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Quizzes]

Practice Exams:
  [Mid 1]  [Mid 2]  [Final]

                           












CS267Fall 2011Sec1Home Page/Syllabus

Topics in Database Systems

Instructor: Chris Pollett
Office: MH 214
Phone Number: (408) 924 5145
Email: chris@pollett.org
Office Hours: M 6:00pm-7:30pm, MW 2:55pm-4:25pm
Class Meets:
Sec1 MW 1:30pm-2:45pm in MH422

Prerequisites

To take this class you must have taken: CS157B with a grade of C- or better.

Texts and Links

Required Texts: Information Retrieval: Implementing and Evaluating Search Engines. Buttcher, Clarke, and Cormack
Online References and Other Links: Yioop! Open Source Search Engine.
Nutch.
Wumpus.
Heritrix.

Topics and Outcomes

Information Retrieval is the study of how to represent, search, and manipulate large collections of text and human data. Modern search engines such as Google, Bing, Ask, Baidu, Yandex, Blekko are probably the most familiar examples of IR systems. Other examples are digital libraries (Melvyl), e-mail, and technical report systems, plagiarism systems such as turnitin.com, and even desktop search systems. Such systems are databases; however, the typical implementations of their building blocks such as indices, ordering result sets, and so on differs from conventional databases. The focus of this class is on implementation techniques for information retrieval systems, and also on measuring how effective the results returned from such systems are. By the end of this course, students should be able to: (1) Code a basic inverted index capable of performing conjunctive queries. (2) Be able to calculate by hand on small examples precision (fraction relevant results returned), recall (fraction of results which are relevant), and other IR statistics. (3) Be able to explain where BM25, BM25F and difference from randomness statistics come from. (4) Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes. (5) Demonstrate with small examples how incremental index updates can be done with log merging. (6) Be able to evaluate search results by hand and using TREC eval software. (7) Know at least one Map Reduce algorithm (for example to calculate page rank).

Below is a tentative time table for when we'll do things this quarter:

Week 1: Aug 22, Aug 24 (First Day) Read Ch 1.1, 1.2 Introduction to IR
Week 2: Aug 29, Aug 31 Finish Ch 1
Week 3: Sep 5(Labor Day), Sep 7 Read Ch 2.1-2.2, Phrase search, inverted indexes, VSM
Week 4: Sep 12, Sep 14 Finish Ch 2 Recall and precision
Week 5: Sep 19, Sep 21 Read Ch 3 Stemming, stopping, and n-grams, will supplement with material on how to crawl
Week 6: Sep 26, Sep 28 Read Ch 4. Parts of inverted indexes and construction of them
Week 7: Oct 3, Oct 5 (Banff)
Week 8: Oct 10, Oct 12 Finish Ch 4
Week 9: Oct 17, Oct 19 Read Ch 5. Query processing techniques
Week 10: Oct 24, Oct 26 Finish Ch5. Start Ch 6. Index compression
Week 11: Oct 31, Nov 2 More Ch 6.
Week 12: Nov 7, Nov 9 (Oberwolfach)
Week 13: Nov 14, Nov 16 Finish Ch6
Week 14: Nov 21, Nov 23 Read 7.1, 7.2, Incremental index updates, Read Ch 9. DFR
Week 15: Nov 28, Nov 30 Read Ch 14. Map reduce algorithms
Week 16: Dec 5, Dec 7 Finish Ch 14
The final will be Tuesday, December 13, 12:15pm-14:30pm

Grading

HWs and Quizzes 40%
Midterm 1 15%
Midterm 2 15%
Final 30%
Total100%

Grades will be calculated in the following manner: The person or persons with the highest aggregate score will receive an A+. Since this is a graduate class, the curve will be slightly higher than for an undergrad course taught by me. A score of 55 will be the cut-off for a B-. The region between this high and low score will be divided into five equal-sized regions. From the top region to the low region, a score falling within a region receives the grade: A, A-, B+, B, B-. If the boundary between an A and an A- is 85, then the score 85 counts as an A-. Scores below 55 but above 50 receive the grade D. Those below 50 receive the grade F.

If you do better than an A- in this class and want me to write you a letter of recommendation, I will generally be willing provided you ask me within two years of taking my course. Be advised that I write better letters if I know you to some degree.

Homework and Quiz Info

This semester we will have four homeworks and weekly quizzes. Every Monday this semester, except the first day of class, the Midterm Review Days, and holidays; there will be a quiz on the previous week's material. The answer to the quiz will either be multiple choice, true-false, or a simple numeric answer that does not require a calculator. Each quiz is worth a maximum of 1pt. Out of the total of thirteen quizzes this semester, I will keep your ten best scores.

Links to the current list of homeworks and quizzes can be found on the left hand frame of the class homepage. After an assignment has been returned a link to its solution (based on the best student solutions) will be placed off the assignment page. Material from assignments may appear on midterms and finals. For homeworks you are encouraged to work in groups of up to three people. Only one person out of this group needs to submit the homework assignment; however, the members of the group need to be clearly identified in all submitted files. Homeworks for this class will be submitted and returned completely electronically. To submit an assignment click on the submit homework link for your section on the left hand side of the homepage and filling out the on-line form. Hardcopies or e-mail versions of your assignments will be rejected and not receive credit. Homeworks will always be due by the start of class on the day their due. Late homeworks will not be accepted and missed quizzes cannot be made up; however, your lowest score amongst the five homeworks and your quiz total will be dropped.

When doing the programming part of an assignment please make sure to adhere to the specification given as closely as possible. Names of files should be as given, etc. Failure to follow the specification may result in your homework not being graded and you receiving a zero for your work.

Exams

The midterms will be during class time on: Oct 3 and Nov 7.

The final will be: Tuesday, December 13, 12:15pm-14:30pm.

All exams are closed book, closed notes and in this classroom. You will be allowed only the test and your pen or pencil on your desk during these exams. The final will cover material from the whole quarter although there will be an emphasis on material after the last midterm. No make ups will be given. The final exam may be scaled to replace a midterm grade if it was missed under provably legitimate circumstances. These exams will test whether or not you have mastered the material both presented in class or assigned as homework during the quarter. My exams usually consist of a series of essay style questions. I try to avoid making tricky problems. The week before each exam I will give out a list of problems representative of the level of difficulty of problems the student will be expected to answer on the exam. Any disputes concerning grades on exams should be directed to me, Professor Pollett.

Regrades

If you believe an error was made in the grading of your program or exam, you may request in person a regrade from me, Professor Pollett, during my office hours. I do not accept e-mail requests for regrades. A request for a regrade must be made no more than a week after the homework or a midterm is returned. If you cannot find me before the end of the semester and you would like to request a regrade of your final, you may see me in person at the start of the immediately following semester.

Academic Honesty

Your own commitment to learning, as evidenced by your enrollment at San Jose State University, and the University's Academic Integrity Policy requires you to be honest in all your academic course work. Faculty members are required to report all infractions to the Office of Student Conduct and Ethical Development. The policy on academic integrity can be found at http://sa.sjsu.edu/student_conduct.

Specifically, for this class, you should obviously not cheat on tests. For homeworks, you should not discuss or share code or problem solutions between groups! At a minimum a 0 on the assignment or test will be given. A student caught using resources like Rent-a-coder will receive an F for the course and be referred to University for disciplinary action.

Additional Policies and Procedures

The campus policy to ensure compliance with the Americans with Disabilities Act is:
"If you need course adaptations or accommodations because of a disability, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 requires that students with disabilities register with DRC to establish a record of their disability."

More information about SJSU policies and procedures can be found at the following links: