Chris Pollett > Old Classses >
CS267

( Print View )

Student Corner:
  [Grades Sec1]

  [Submit Sec1]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Description]
  [Course Outcomes]
  [Outcomes Matrix]
  [Course Schedule]
  [Grading]
  [Requirements/HW/Quizzes]
  [Class Protocols]
  [Exam Info]
  [Regrades]
  [University Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Hw5]  [Quizzes]

Practice Exams:
  [Midterm]  [Final]

                           












CS267 Spring 2016 Sec1 Home Page/Syllabus

Topics in Database Systems

Instructor: Chris Pollett
Office: MH 214
Phone Number: (408) 924 5145
Email: chris@pollett.org
Office Hours: MW 3:00-4:15pm
Class Meets:
Sec1 MW 4:30pm-5:45pm in MH225

Prerequisites

To take this class you must have taken: CS157B with a grade of C- or better.

Texts and Links

Required Texts: Information Retrieval: Implementing and Evaluating Search Engines
Online References and Other Links: Yioop! Open Source Search Engine.
Nutch.
Wumpus.
Heritrix.

Description

Information Retrieval is the study of how to represent, search, and manipulate large collections of text and human data. Modern search engines such as Google, Bing, Baidu, Yandex are probably the most familiar examples of IR systems. Other examples are digital libraries (Melvyl), e-mail, and technical report systems, plagiarism systems such as turnitin.com, and even desktop search systems. Such systems are databases; however, the typical implementations of their building blocks such as indices, ordering result sets, and so on differs from conventional databases. The focus of this class is on implementation techniques for information retrieval systems, and also on measuring how effective the results returned from such systems are.

Course Learning Outcomes (CLOs)

By the end of this course, a student should be able to:

CLO1 -- Code a basic inverted index capable of performing conjunctive queries.

CLO2 -- Be able to calculate by hand on small examples precision (fraction relevant results returned), recall (fraction of results which are relevant), and other IR statistics.

CLO3 -- Be able to explain where BM25, BM25F and divergence from randomness statistics come from.

CLO4 -- Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes.

CLO5 -- Demonstrate with small examples how incremental index updates can be done with log merging.

CLO6 -- Be able to evaluate search results by hand and using TREC evalsoftware.

CLO7 -- Know at least one Map Reduce algorithm (for example to calculate page rank).

Course Schedule

Below is a tentative time table for when we'll do things this quarter:

Week 1: Feb 1, Feb 3 Read Ch 1.1, 1.2 Introduction to IR
Week 2: Feb 8, Feb 10 Finish Ch 1
Week 3: Feb 15, Feb 17 Read Ch 2.1-2.2, Phrase search, inverted indexes, VSM
Week 4: Feb 22, (HW1 due) Feb 24 Finish Ch 2 Recall and precision
Week 5: Feb 29, Mar 2 Read Ch 3 Stemming, stopping, and n-grams, will supplement with material on how to crawl
Week 6: Mar 7, Mar 9 Read Ch 4. Parts of inverted indexes and construction of them
Week 7: Mar 14, Mar 16 (HW2 due) Finish Ch 4
Week 8: Mar 21, Mar 23 (Midterm) Review
Week 9: Mar 28, Mar 30 March Break
Week 10: Apr 4, Apr 6 Read Ch 5. Query processing techniques
Week 11: Apr 11, (HW3 due) Apr 13 Finish Ch5. Start Ch 6. Index compression
Week 12: Apr 18, Apr 20 More Ch 6.
Week 13: Apr 25 (HW4 due) , Apr 27 Finish Ch6
Week 14: May 2, May 4 Read 7.1, 7.2, Incremental index updates, Read Ch 9. DFR
Week 15: May 9, May 11 Read Ch 14. Map reduce algorithms
Week 16: May 16, May 18 (No Class) Finish Ch 14
The final will be Thursday, May 19 from 2:45-5:00pm

Grading

HWs and Quizzes 50%
Midterm 20%
Final 30%
Total100%

Grades will be calculated in the following manner: The person or persons with the highest aggregate score will receive an A+. Since this is a graduate class, the curve will be slightly higher than for an undergrad course taught by me. A score of 55 will be the cut-off for a B-. The region between this high and low score will be divided into five equal-sized regions. From the top region to the low region, a score falling within a region receives the grade: A, A-, B+, B, B-. If the boundary between an A and an A- is 85, then the score 85 counts as an A-. Scores below 55 but above 50 receive the grade D. Those below 50 receive the grade F.

If you do better than an A- in this class and want me to write you a letter of recommendation, I will generally be willing provided you ask me within two years of taking my course. Be advised that I write better letters if I know you to some degree.

Course Requirements, Homework and Quiz Info

The university policy regarding credit hours for classes states:
"SJSU classes are designed such that in order to be successful, it is expected that students will spend a minimum of forty-five hours for each unit of credit (normally three hours per unit per week), including preparing for class, participating in course activities, completing assignments, and so on. More details about student workload can be found in
University Policy S12-3 [PDF]."

This semester we will have five homeworks and weekly quizzes. Every Monday this semester, except the first day of class, the Midterm Review Day, and holidays, there will be a quiz on the previous week's material. The answer to the quiz will either be multiple choice, true-false, or a simple numeric answer that does not require a calculator. Each quiz is worth a maximum of 1pt with no partial credit being given. Out of the total of twelve quizzes this semester, I will keep your ten best scores.

Links to the current list of homeworks and quizzes can be found on the left hand frame of the class homepage. After an assignment has been returned, a link to its solution (based on the best student solutions) will be placed off the assignment page. Material from assignments may appear on midterms and finals. For homeworks you are encouraged to work in groups of up to three people. Only one person out of this group needs to submit the homework assignment; however, the members of the group need to be clearly identified in all submitted files.

Homeworks for this class will be submitted and returned completely electronically. To submit an assignment click on the submit homework link for your section on the left hand side of the homepage and filling out the on-line form. Hardcopies or e-mail versions of your assignments will be rejected and not receive credit. Homeworks will always be due by the start of class on the day their due. Late homeworks will not be accepted and missed quizzes cannot be made up; however, your lowest score amongst the five homeworks and your quiz total will be dropped.

For this class, I expect each student to have available a laptop with Apache, PHP, and MYSQL installed. Your laptop will be used whenever you want to show me something in my office concerning one of your projects.

When doing the programming part of an assignment please make sure to adhere to the specification given as closely as possible. Names of files should be as given, etc. Failure to follow the specification may result in your homework not being graded and you receiving a zero for your work.

NOTE that University policy F69-24 [PDF] states that "Students should attend all meetings of their classes, not only because they are responsible for material discussed therein, but because active participation is frequently essential to insure maximum benefit for all members of the class. Attendance per se shall not be used as a criterion for grading."

Classroom Protocol

I will start lecturing close to the official start time for this class modulo getting tangled up in any audio/visual presentation tools I am using. Once I start lecturing, please refrain from talking to each other, answering your cell phone, etc. If something I am talking about is unclear to you, feel free to ask a question about it. Typically, on practice tests days, you will get to work in groups, and in so doing, turn your desks facing each other, etc. Please return your desks back to the way they were at the end of class. This class has an online class discussion board which can be used to post questions relating to the homework and tests. Please keep discussions on this board civil. This board will be moderated. Class and discussion board participation, although not a component of your grade, will be considered if you ask me to write you a letter of recommendation.

Exams

The midterm will be during class time on: Mar 23.

The final will be: Thursday, May 19 from 2:45-5:00pm.

All exams are closed book, closed notes and in this classroom. You will be allowed only the test and your pen or pencil on your desk during these exams. The final will cover material from the whole semester although there will be an emphasis on material after the last midterm. No make ups will be given. The final exam may be scaled to replace a midterm grade if it was missed under provably legitimate circumstances. These exams will test whether or not you have mastered the material both presented in class or assigned as homework during the quarter. My exams usually consist of a series of essay style questions. I try to avoid making tricky problems. The week before each exam I will give out a list of problems representative of the level of difficulty of problems the student will be expected to answer on the exam. Any disputes concerning grades on exams should be directed to me, Professor Pollett.

Regrades

If you believe an error was made in the grading of your program or exam, you may request in person a regrade from me, Professor Pollett, during my office hours. I do not accept e-mail requests for regrades. A request for a regrade must be made no more than a week after the homework or a midterm is returned. If you cannot find me before the end of the semester and you would like to request a regrade of your final, you may see me in person at the start of the immediately following semester.

University Policies and Procedures

General Expectations, Rights and Responsibilities of the Student

As members of the academic community, students accept both the rights and responsibilities incumbent upon all members of the institution. Students are encouraged to familiarize themselves with SJSU's policies and practices pertaining to the procedures to follow if and when questions or concerns about a class arises. See University Policy S90-5[PDF]. More detailed information on a variety of related topics is available in the SJSU catalog. In general, it is recommended that students begin by seeking clarification or discussing concerns with their instructor. If such conversation is not possible, or if it does not serve to address the issue, it is recommended that the student contact the Department Chair as a next step.

Academic Integrity

Your own commitment to learning, as evidenced by your enrollment at San Jose State University, and the University's Academic Integrity Policy requires you to be honest in all your academic course work. Faculty members are required to report all infractions to the Office of Student Conduct and Ethical Development. The policy on academic integrity can be found at http://www.sjsu.edu/studentconduct/.

Specifically, for this class, you should obviously not cheat on tests. For homeworks, you should not discuss or share code or problem solutions between groups! At a minimum a 0 on the assignment or test will be given. A student caught using resources like Rent-a-coder will receive an F for the course and be referred to University for disciplinary action.

Campus Policy to Ensure Compliance with the Americans with Disabilities Act

The campus policy to ensure compliance with the Americans with Disabilities Act is:
"If you need course adaptations or accommodations because of a disability, or if you need special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 requires that students with disabilities register with Accessible Education Center to establish a record of their disability."

Dropping and Adding

Students are responsible for understanding the policies and procedures about add/drop, grade forgiveness, etc. Refer to the current semester's Catalog Policies section. Add/drop deadlines can be found on the current academic year calendars document on the Academic Calendars webpage. The Late Drop Policy is available at http://www.sjsu.edu/aars/policies/latedrops/policy/. Students should be aware of the current deadlines and penalties for dropping classes.

Information about the latest changes and news is available at the Advising Hub.

Consent for Recording of Class and Public Sharing of Instructor Material

University Policy S12-7 [PDF], requires students to obtain instructor's permission to record the course and the following items to be included in the syllabus:

  • "Common courtesy and professional behavior dictate that you notify someone when you are recording him/her. You must obtain the instructor's permission to make audio or video recordings in this class. Such permission allows the recordings to be used for your private, study purposes only. The recordings are the intellectual property of the instructor; you have not been given any rights to reproduce or distribute the material."
    • It is suggested that the greensheet include the instructor's process for granting permission, whether in writing or orally and whether for the whole semester or on a class by class basis.
    • In classes where active participation of students or guests may be on the recording, permission of those students or guests should be obtained as well.
  • "Course material developed by the instructor is the intellectual property of the instructor and cannot be shared publicly without his/her approval. You may not publicly share or upload instructor generated material for this course such as exam questions, lecture notes, or homework solutions without instructor consent."

For this class to request permission to use materials, please send me an email making the request and saying what the requested material will be for.