CS298 Proposal
Text Summarization for Compressed Inverted Indexes and snippets
Mangesh Dahale (mangeshadahale@gmail.com)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Sami Khuri, Prof. Ronald Mak.
Abstract:
Text Summarization is a technique to generate a concise summary of a large text. The major challenge in text summarization lies in distinguishing the more informative parts of a document from the less ones. In search engines, text summarization can be used to generate compressed descriptions for web pages. During indexing these compressed pages can be used rather than the whole pages when building inverted indexes. For query results, summaries can be used during snippet generation. In CS297, I implemented three different techniques for text summarization. I evaluated these three techniques and found the technique based on centroid method to be the best. In this project, I will be implementing this method in Yioop. After integrating it, I will be testing Yioop for to see if this technique improves query results. I will also look at its impact in terms of query processing speed and indexing speed.
CS297 Results
- Studied different methods of Text Summarization.
- Coded three methods of text summarization from three different research papers.
- Evaluated performance of the three methods and found best summarization method.
Proposed Schedule
Week 1:
Jan.28-Feb.04 | Prepare a CS298 Proposal and upload it. |
Week 2,3:
Feb.5-Feb.18 | Deliverable#1: Optimize the code for centroid method |
Week 4,5:
Feb.19-Mar.04 | Study complete working of the Yioop search engine. |
Week 6,7:
Mar.05-18 | Deliverable#2: Integrate centroid based summarizer in Yioop. |
Week 8:
Mar.19-25 | Test the code after integrating centroid method. |
Week 9,10:
Mar.26 -Apr.08 | Deliverable#3: Perform tests to see the improvements after integration of code. |
Week 11:
Apr.09-15 | Work on CS298 Report. |
Week 12:
Apr.16-22 | Create a first draft of CS298 report. |
Week 13,14:
Apr.23-May.06 | Create a final CS298 Report and submit to Advisor and committee members. |
Week 15:
May.07-13 | Defense |
Key Deliverables:
- Software
- Optimized text summarizer based on centroid method.
- Implementation of centroid method in Yioop
- Report
- CS298 Report
- Project code and test result documentation
Innovations and Challenges
- Optimizing the code for centroid based summarizer to generate summary in less than one third of a second is a challenge
- Generating and displaying word cloud in search results will be interesting and challenging.
References:
[Jones2007] Automatic summarising : a review and discussion of the state of the art. Jones, K. University of Cambridge Computer Laboratory. 2007
[Mihalcea2004] TextRank: Bringing order into texts.Mihalcea, R., and Tarau, P. InProceedings of EMNLP. 2004.
[Radev 2004] Centroid-based summarization of multiple documents. Radev, D. R., Jing, H., Stys, M., & Tam, D. Information Processing & Management, 40(6), 919-938. 2007.
[Alguliev2013] Formulation of document summarization as a 0-1 nonlinear programming problem. Rasim M. Alguliev, Ramiz M. Aliguliyev, Nijat R. Isazade. Computers & Industrial Engineering, Volume 64, Issue 1 ISSN 0360-8352. 2007. |