Chris Pollett >
Students > [Bio] [Del 1: OPIC Algorithm implementation] [Del 2: SALSA Algorithm & Nutch] [Del 4: HITS Algorithm implementation] [CS298 Project Source Code - ZIP] |
CS297 ProposalAn Online version of HITS-based search engineAmith Kollam Chandranna (amithkc@gmail.com) Advisor: Dr. Chris Pollett Description: This CS297 proposal aims at implementing an efficient and fast "Online" HITS-based search engine. HITS is short for "Hyperlink-Induced Topic Search" link analysis algorithm. It is also known as "Hubs and authorities". The original HITS algorithm requires the crawl to be done first and later the scores of HUB and Authority are calculated. But in this project, we plan to implement the online score calculation (HUB and Authority) i.e., the scores are calculated at the time of crawling. We would also like to understand OPIC (On-line Page Importance Calculation) algorithm. This is because OPIC implements an online score calculation but for "PageRank" algorithm. This understanding would be very helpful in this project. The project is proposed to be implemented in PHP and MySQL. Analysis of other existing search engines (like Nutch) would be carried out to gain a good understanding of existing search techniques. This understanding would be very helpful in developing more efficient algorithms for retrieving, storing data. Also, it is proposed to carry out performance comparisons (after implementing the project) with other search engines. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Understanding of the PageRank, OPIC and HITS algorithm. 2. Understanding the implementation of already existing search engines. 3. Deciding the algorithms that would be implemented in CS298. 4. High-level Design document. 5. Tentative project schedule for CS298. References: [Lieb 2009] The truth about search engine optimization. Rebecca Lieb. Upper Saddle River, N.J. : FT Press. 2009. [Levitin 2007] Introduction to the design and analysis of algorithms. Anany Levitin. Boston : Pearson Addison-Wesley. 2007. [Langville 2006] Google's PageRank and beyond : the science of search engine rankings. Amy N. Langville and Carl D. Meyer. Princeton, N.J. : Princeton University Press. 2006. |