Chris Pollett >
Students > [Bio] [Del 1: OPIC Algorithm implementation] [Del 2: SALSA Algorithm & Nutch] [Del 4: HITS Algorithm implementation] [CS298 Project Source Code - ZIP] |
CS298 ProposalAn Online version of Hyperlink-Induced Topic Search (HITS) based search engineAmith Kollam Chandranna (amithkc@gmail.com) Advisor: Dr. Chris Pollett Committee Members: Dr. Tsau Young Lin (tylin@cs.sjsu.edu) and Prof Mark Stamp (stamp@cs.sjsu.edu) Abstract:In general, search engines perform the ranking of the web pages in an offline mode, which is after the web pages have been retrieved and stored in the database.The existing "Hyperlink-Induced Topic Search" algorithm (HITS) operates in an offline mode of page rank calculation. In this project, we will implement an online mode of page ranking for this algorithm. Performing online ranking offers several benefits. Retrieving web pages and calculating ranking of web pages can be implemented using parallel processing. Also, the disk input/output time is minimized as the ranking is done during runtime by accessing the graph matrix in the main memory. This will improve the overall efficiency of the algorithm. The project also includes creating small test models, researching related journal articles, analyzing reference guides, generating test cases, and comparing test results with other models. This will help to understand the overall processes involved in successfully implementing the modified HITS algorithm. CS297 Results:
Proposed Schedule:
Key Deliverables:
Innovations and Challenges:
References:N. Langville, Amy., & D. Meyer, Carl. (2006). Google's PageRank and Beyond. Princeton University Press. Nomura, Saeko., Toru Ishida, Satoshi Oyama., & Hayamizu, Tetsuo. (2004). Analysis and Improvement of HITS Algorithm for Detecting Web Communities. [Electronic version]. ACM Systems and Computers in Japan, Vol 35, Issue 13, 32 - 42. Borodin, Allan., O. Roberts, Gareth., S. Rosenthal, Jeffrey., & Tsaparas, Panayiotis. (2005). Link analysis ranking: algorithms, theory, and experiments. [Electronic version]. ACM Transactions on Internet Technology (TOIT), Vol 5, Issue 1, 231 - 297. Lempel, R., & Moran., S. (2001). SALSA: The Stochastic Approach for Link-Structure Analysis. [Electronic version]. ACM Transactions on Information Systems, Vol. 19, 131 - 160. Nutch Tutorial. (n.d). Retrieved May 07, 2010 from Apache's web site: http://lucene.apache.org/nutch/tutorial.html |