Information Retrieval is the study of how to represent, search, and manipulate
large collections of text and human data. Modern search engines such as Google, Bing, Ask, Baidu, Yandex, Blekko
are probably the most familiar examples of IR systems. Other examples are digital libraries (Melvyl),
e-mail, and technical report systems, plagiarism systems such as turnitin.com, and even desktop search systems.
Such systems are databases; however, the typical implementations of their building blocks such as indices, ordering
result sets, and so on differs from conventional databases. The focus of this class is on implementation
techniques for information retrieval systems, and also on measuring how effective the results returned from such
systems are. By the end of this course, students should be able to: (1) Code a basic inverted index capable of
performing conjunctive queries. (2) Be able to calculate by hand on small examples precision (fraction relevant
results returned), recall (fraction of results which are relevant), and other IR statistics. (3) Be able to explain
where BM25, BM25F and difference from randomness statistics come from. (4) Give an example of how a posting list
might be compressed using difference lists and gamma codes or Rice codes. (5) Demonstrate with small examples how
incremental index updates can be done with log merging. (6) Be able to evaluate search results by hand and using TREC eval
software. (7) Know at least one Map Reduce algorithm (for example to calculate page rank).

Below is a tentative time table for when we'll do things this quarter:

Grades will be calculated in the following manner: The person or persons with the
highest aggregate score
will receive an A+. Since this is a graduate class, the curve will be slightly
higher than for an undergrad course taught by me. A score of 55 will be
the cut-off for a B-. The region between
this high and low score will be
divided into five equal-sized regions. From the top region to the low region,
a score falling within a region
receives the grade: A, A-, B+, B, B-. If the boundary between an
A and an A- is 85, then the score
85 counts as an A-. Scores below 55 but above 50 receive the grade D.
Those below 50 receive the
grade F.

If you do better than
an A- in this class and want me to write you a letter of recommendation, I will generally
be willing provided
you ask me within two years of taking my course.
Be advised that I write better letters if I know you to some degree.

This semester we will have four homeworks and weekly quizzes. Every Monday this semester, except the first day of class, the Midterm Review Days, and holidays; there will be a quiz on the previous week's material. The answer to the quiz will either be multiple choice, true-false, or a simple numeric answer that does not require a calculator. Each quiz is worth a maximum of 1pt. Out of the total of thirteen quizzes this semester, I will keep your ten best scores.

As part of your homework score, every Wednesday this semester, except the first day of class, the Midterm Review Days, and holidays; there will be one or two book problems which are due in class. Your solution should be **handwritten** (not typed unless you give me a reason from the DRC). Your answer does not have to be verbose, but should in complete sentences, should set up the problem, and explain how you solved it. We will go over the solution to the problem in class. If you cannot attend on a given Wednesday, make sure someone turns in your homework for you. Each week at most one problem that you submitted will be graded and will contribute up to 1 point to the next homework score.

These book problems will constitute part of the four homeworks which will be due this semester. The remainder of these homeworks will likely be small to medium coding project.
Links to the current list of homeworks and quizzes can be found on the left hand frame of the class homepage. After an assignment has been returned a link to its solution (based on the best student solutions and less the book problems) will be placed off the assignment page. Material from assignments may appear on midterms and
finals. **The book portion of each homework must be your own individual work in your own words. For the coding portion of
homeworks you are encouraged to work in groups of up to three people.
Only one person out of this group needs to submit this portion of the homework assignment; however,
the members of the group need to be clearly identified in all submitted files.**
Homeworks for this class will be submitted and returned completely electronically. To submit an assignment click on the submit homework
link for your section on the left hand side of the homepage and filling out the on-line
form. Hardcopies or e-mail versions of your assignments will be rejected and not receive
credit. Homeworks will always be due by the start of class on
the day their due. Late homeworks will not be accepted and missed quizzes cannot be made up; however, your lowest score amongst the four homeworks and your quiz total will be dropped.

When doing the programming part of an assignment please make sure to adhere to the
specification given as closely as possible. Names of files should be as given, etc.
Failure to follow the specification may result in your homework not being graded and
you receiving a zero for your work.

The midterms will be during class time on:
Sep 26 and Nov 7.

The final will be: Monday, December 17 from 12:15-2:30pm.

All exams are closed book,
closed notes and in this classroom. You will
be allowed only the test and your pen or pencil on your desk during these
exams. The final will
cover material from the whole quarter although there will be an emphasis on
material after the last midterm. No make ups will be given. The final exam
may be scaled to replace a midterm grade if it was missed under provably
legitimate circumstances. These exams will test whether or not you
have mastered the material both presented in
class or assigned as homework during the quarter. My exams usually consist
of a series of essay style questions. I try to avoid
making tricky problems. The week before each exam I will give out a
list of problems representative of the level of difficulty of problems the
student will be expected to answer on the exam. Any disputes concerning
grades on exams should be directed to me, Professor Pollett.

Specifically, for this class, you should obviously not cheat on tests. For homeworks,
you should not discuss or share code or problem solutions between groups!
At a minimum a 0 on the assignment or test will be given. A student
caught using resources like Rent-a-coder will receive
an F for the course and be referred to University for disciplinary action.

The campus policy to ensure compliance with the Americans with Disabilities Act is:

"If you need course adaptations or accommodations because of a disability, or if you need special arrangements in case the building must be evacuated, please make an
appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 requires that students with disabilities register with DRC to
establish a record of their disability."

The university policy regarding credit hours for classes states:

Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.

More information about SJSU policies and procedures can be found at the following links: