Chris Pollett >
Students > [Bio] [CS298 Progress Report Spring 2011- PDF] |
Deliverable 3GoalThe goal of this deliverable was to understand the working of BM25F algorithm and implement the algorithm in PHP from scratch. BM25 Algorithm
In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others. BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity). It is not a single function, but actually a whole family of scoring functions, with slightly different components and parameters. One of the most prominent instantiations of the function is as follows. BM25F Algorithm
BM25F is a newer variant of BM25 that can take document structure and anchor text into account. First we obtain the accumulated weight of a term over all fields as BM25F Simulator
To simulate the BM25F algorithm a tool was created in PHP which takes as input 100 websites and extract all the terms that occur in them. It then computes the BM25F score for all these terms based on where they occur in the document. This information is stored in a table which is used by the tool which accepts a user query. Based on the query the tool generates the combined BM25F score of query for all the 100 pages. Below are the snapshots of the tool for a query. Source Code |