Chris Pollett >
Students >
Mangesh [Bio] [Text Summarizer based on Intersection method] [Text Summarizer based on Centroid method] [Text Summarizer based on TF-ISF method] |
CS298 ProposalText Summarization for Compressed Inverted Indexes and snippetsMangesh Dahale (mangeshadahale@gmail.com) Advisor: Dr. Chris Pollett Committee Members: Dr. Sami Khuri, Prof. Ronald Mak. Abstract:Text Summarization is a technique to generate a concise summary of a large text. The major challenge in text summarization lies in distinguishing the more informative parts of a document from the less ones. In search engines, text summarization can be used to generate compressed descriptions for web pages. During indexing these compressed pages can be used rather than the whole pages when building inverted indexes. For query results, summaries can be used during snippet generation. In CS297, I implemented three different techniques for text summarization. I evaluated these three techniques and found the technique based on centroid method to be the best. In this project, I will be implementing this method in Yioop. After integrating it, I will be testing Yioop for to see if this technique improves query results. I will also look at its impact in terms of query processing speed and indexing speed. CS297 Results
Proposed Schedule
Key Deliverables:
Innovations and Challenges
References:[Jones2007] Automatic summarising : a review and discussion of the state of the art. Jones, K. University of Cambridge Computer Laboratory. 2007 [Mihalcea2004] TextRank: Bringing order into texts.Mihalcea, R., and Tarau, P. InProceedings of EMNLP. 2004. [Radev 2004] Centroid-based summarization of multiple documents. Radev, D. R., Jing, H., Stys, M., & Tam, D. Information Processing & Management, 40(6), 919-938. 2007. [Alguliev2013] Formulation of document summarization as a 0-1 nonlinear programming problem. Rasim M. Alguliev, Ramiz M. Aliguliyev, Nijat R. Isazade. Computers & Industrial Engineering, Volume 64, Issue 1 ISSN 0360-8352. 2007. |