Chris Pollett > Old Classes > CS267
( Print View )

Student Corner:
  [Grades Sec2]

  [Submit Sec2]

  [Class Sign Up Sec2]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Quizzes]

Practice Exams:
  [Mid 1]  [Mid 2]  [Final]

                           












CS267 Fall 2012 Sec2 Home Page/Syllabus

Topics in Database Systems

Instructor: Chris Pollett
Office: MH 214
Phone Number: (408) 924 5145
Email: chris@pollett.org
Office Hours: MW 2:45-4pm and 5:45-6:45pm
Class Meets:
Sec2 MW 1:30pm-2:45pm in MH223

Prerequisites

To take this class you must have taken: CS157B with a grade of C- or better.

Texts and Links

Required Texts: Information Retrieval: Implementing and Evaluating Search Engines. Buttcher, Clarke, and Cormack
Online References and Other Links: Yioop! Open Source Search Engine.
Nutch.
Wumpus.
Heritrix.

Topics and Outcomes

Information Retrieval is the study of how to represent, search, and manipulate large collections of text and human data. Modern search engines such as Google, Bing, Ask, Baidu, Yandex, Blekko are probably the most familiar examples of IR systems. Other examples are digital libraries (Melvyl), e-mail, and technical report systems, plagiarism systems such as turnitin.com, and even desktop search systems. Such systems are databases; however, the typical implementations of their building blocks such as indices, ordering result sets, and so on differs from conventional databases. The focus of this class is on implementation techniques for information retrieval systems, and also on measuring how effective the results returned from such systems are. By the end of this course, students should be able to: (1) Code a basic inverted index capable of performing conjunctive queries. (2) Be able to calculate by hand on small examples precision (fraction relevant results returned), recall (fraction of results which are relevant), and other IR statistics. (3) Be able to explain where BM25, BM25F and difference from randomness statistics come from. (4) Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes. (5) Demonstrate with small examples how incremental index updates can be done with log merging. (6) Be able to evaluate search results by hand and using TREC eval software. (7) Know at least one Map Reduce algorithm (for example to calculate page rank).

Below is a tentative time table for when we'll do things this quarter:

Week 1: Aug 20, Aug 22 (First Day) Read Ch 1.1, 1.2 Introduction to IR
Week 2: Aug 27 , Aug 29 Finish Ch 1
Week 3: Sep 3(Labor Day) , Sep 5 Read Ch 2.1-2.2, Phrase search, inverted indexes, VSM
Week 4: Sep 10 , Sep 12 (HW1 due) Finish Ch 2 Recall and precision
Week 5: Sep 17 , Sep 19 Read Ch 3 Stemming, stopping, and n-grams, will supplement with material on how to crawl
Week 6: Sep 24 , Sep 26 (Midterm) I will be in Rome
Week 7: Oct 1 , Oct 3 Read Ch 4. Parts of inverted indexes and construction of them
Week 8: Oct 8 , Oct 10 (HW2 due) Finish Ch 4
Week 9: Oct 15 , Oct 17 Read Ch 5. Query processing techniques
Week 10: Oct 22 , Oct 24 Finish Ch5. Start Ch 6. Index compression
Week 11: Oct 29 , Oct 31 More Ch 6.
Week 12: Nov 5 , Nov 7 (Midterm 2) Review
Week 13: Nov 12 (Holiday) , Nov 14 Hw3 due Nov13. Finish Ch6
Week 14: Nov 19 , Nov 21 Read 7.1, 7.2, Incremental index updates, Read Ch 9. DFR
Week 15: Nov 26 , Nov 28 Read Ch 14. Map reduce algorithms
Week 16: Dec 3 , Dec 5 (HW4 due) Finish Ch 14
Week 17: Dec 10 , Dec 12 (No Class) Review
The final will be Monday, December 17 from 12:15-2:30pm

Grading

HWs and Quizzes 40%
Midterm 1 15%
Midterm 2 15%
Final 30%
Total100%

Grades will be calculated in the following manner: The person or persons with the highest aggregate score will receive an A+. Since this is a graduate class, the curve will be slightly higher than for an undergrad course taught by me. A score of 55 will be the cut-off for a B-. The region between this high and low score will be divided into five equal-sized regions. From the top region to the low region, a score falling within a region receives the grade: A, A-, B+, B, B-. If the boundary between an A and an A- is 85, then the score 85 counts as an A-. Scores below 55 but above 50 receive the grade D. Those below 50 receive the grade F.

If you do better than an A- in this class and want me to write you a letter of recommendation, I will generally be willing provided you ask me within two years of taking my course. Be advised that I write better letters if I know you to some degree.

Homework and Quiz Info

This semester we will have four homeworks and weekly quizzes. Every Monday this semester, except the first day of class, the Midterm Review Days, and holidays; there will be a quiz on the previous week's material. The answer to the quiz will either be multiple choice, true-false, or a simple numeric answer that does not require a calculator. Each quiz is worth a maximum of 1pt. Out of the total of thirteen quizzes this semester, I will keep your ten best scores.

As part of your homework score, every Wednesday this semester, except the first day of class, the Midterm Review Days, and holidays; there will be one or two book problems which are due in class. Your solution should be handwritten (not typed unless you give me a reason from the DRC). Your answer does not have to be verbose, but should in complete sentences, should set up the problem, and explain how you solved it. We will go over the solution to the problem in class. If you cannot attend on a given Wednesday, make sure someone turns in your homework for you. Each week at most one problem that you submitted will be graded and will contribute up to 1 point to the next homework score.

These book problems will constitute part of the four homeworks which will be due this semester. The remainder of these homeworks will likely be small to medium coding project. Links to the current list of homeworks and quizzes can be found on the left hand frame of the class homepage. After an assignment has been returned a link to its solution (based on the best student solutions and less the book problems) will be placed off the assignment page. Material from assignments may appear on midterms and finals. The book portion of each homework must be your own individual work in your own words. For the coding portion of homeworks you are encouraged to work in groups of up to three people. Only one person out of this group needs to submit this portion of the homework assignment; however, the members of the group need to be clearly identified in all submitted files. Homeworks for this class will be submitted and returned completely electronically. To submit an assignment click on the submit homework link for your section on the left hand side of the homepage and filling out the on-line form. Hardcopies or e-mail versions of your assignments will be rejected and not receive credit. Homeworks will always be due by the start of class on the day their due. Late homeworks will not be accepted and missed quizzes cannot be made up; however, your lowest score amongst the four homeworks and your quiz total will be dropped.

When doing the programming part of an assignment please make sure to adhere to the specification given as closely as possible. Names of files should be as given, etc. Failure to follow the specification may result in your homework not being graded and you receiving a zero for your work.

Exams

The midterms will be during class time on: Sep 26 and Nov 7.

The final will be: Monday, December 17 from 12:15-2:30pm.

All exams are closed book, closed notes and in this classroom. You will be allowed only the test and your pen or pencil on your desk during these exams. The final will cover material from the whole quarter although there will be an emphasis on material after the last midterm. No make ups will be given. The final exam may be scaled to replace a midterm grade if it was missed under provably legitimate circumstances. These exams will test whether or not you have mastered the material both presented in class or assigned as homework during the quarter. My exams usually consist of a series of essay style questions. I try to avoid making tricky problems. The week before each exam I will give out a list of problems representative of the level of difficulty of problems the student will be expected to answer on the exam. Any disputes concerning grades on exams should be directed to me, Professor Pollett.

Regrades

If you believe an error was made in the grading of your program or exam, you may request in person a regrade from me, Professor Pollett, during my office hours. I do not accept e-mail requests for regrades. A request for a regrade must be made no more than a week after the homework or a midterm is returned. If you cannot find me before the end of the semester and you would like to request a regrade of your final, you may see me in person at the start of the immediately following semester.

Academic Honesty

Your own commitment to learning, as evidenced by your enrollment at San Jose State University, and the University's Academic Integrity Policy requires you to be honest in all your academic course work. Faculty members are required to report all infractions to the Office of Student Conduct and Ethical Development. The policy on academic integrity can be found at http://www.sjsu.edu/studentaffairs/.

Specifically, for this class, you should obviously not cheat on tests. For homeworks, you should not discuss or share code or problem solutions between groups! At a minimum a 0 on the assignment or test will be given. A student caught using resources like Rent-a-coder will receive an F for the course and be referred to University for disciplinary action.

Additional Policies and Procedures

The campus policy to ensure compliance with the Americans with Disabilities Act is:
"If you need course adaptations or accommodations because of a disability, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 requires that students with disabilities register with DRC to establish a record of their disability."

The university policy regarding credit hours for classes states:
Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.

More information about SJSU policies and procedures can be found at the following links: