Outline
- Slidy
- Information Retrieval and Applications
- Basic Architecture
Using Slidy
- The slides for this class are HTML files which both validate as HTML 5 and pass the WAVE Accessibility Checker.
- They are made to look like slides using a Javascript called Slidy.
- The following keystrokes do useful things in Slidy:
- h - help (see all the commands)
- f - fullscreen (gets rid of the links at the bottom of the window
- space - advance a slide
- left/right arrows - forward or back a slide
- up/down arrows - scroll within a slide
- a - show all slides at once for printing
- u - up to the list of lectures
Information Retrieval and Applications
- IR is concerned with representing, searching, and manipulating large collections of text and data.
- Common applications include:
- Web search
- Desktop, email, and filesystem search
- Document Management Systems for businesses.
- Digital Libraries.
- Document Filtering such as News Aggregation.
- Text Clustering and Categorization
- Summarization
- Question Answering
- Multimedia Retrieval
Basic Architecture
- A user has an information need called a topic and issues a query to
the IR system to try to satisfy this need.
- A query typically consists of a small number of search terms (may or may not be words -- could be dates wildcards, etc).
- The user's query is processed by a search engine. This may be on the user's machines, or may be on one or more machines
accessed over an internet.
- A major task of the search engine is to maintain and provide access to an inverted index for a document collection,
the "database" of the search engine. This index provides a mapping between terms and documents that contain those terms. The size of the
inverted index is typically about the same size as that of all the documents which are stored. So one has to be very clever in order to get
information out of it quickly.
- The search engine processes the query and computes a ranked list of results according to a score calculated for each found document.
It then attempts to remove duplicates from the results so far and finally returns what's left to the user.