CS297 Proposal
Text Summarization for Compressed Inverted Indexes and snippets
Mangesh Dahale (mangeshadahale@gmail.com)
Advisor: Dr. Chris Pollett
Description:
The major challenge in summarization lies in distinguishing the more informative parts of a document from the less ones. Text Summarization is a technique to generate a concise summary of a large text. In search engines, Text summarization can be used for generating compressed description of the web pages.For Indexing, these can be used rather than whole pages when building inverted indexes. For query results, summaries can be used for snippet generation. In this project, I will try to design and implement new summarization techniques for Yioop search engine. In the initial stage of the project, I will be implementing three different methods of text summarization and finding the best method for Yioop by evaluating their performances.
Schedule:
Week 1:
Aug.26-Sep.1 | Read about Text Summarization from [Soumya2011] and [Das2007] |
Week 2,3:
Sep.2-15 | Deliverable#1: Study different methods used for text summarization |
Week 4,5:
Sep.16-29 | Code a summarizer using intersection function. |
Week 6,7:
Sep.30-Oct.13 | Code a summarizer using centroid algorithm. |
Week 8,9:
Oct.14-27 | Code a summarizer using TF-ISF algorithm. |
Week 10:
Oct.28-Nov.3 | Deliverable#2 : Code three methods so that we can evaluate their performances. |
Week 11:
Nov.4-10 | Create document samples on which three methods will be applied to create summary |
Week 12,13:
Nov.11-24 | Deliverable#3 : Evaluate performance of these three methods to find best summarization method |
Week 14:
Nov.25-Dec.1 | Decide which method we can implement in Yioop search engine |
Week 15:
Dec.2-8 | Deliverable#4 : CS297 final report |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. Study different methods of Text Summarization.
2. Code three methods so that we can evaluate their performances.
3. Evaluate performance of these three methods to find best summarization method.
4. CS297 final report.
References:
[Soumya2011] Automatic text summarization. Soumya, S. , Kumar, G. , Naseem, R. , and Mohan, S. Computational Intelligence and Information Technology. 2011.
[Hassel2007] Resource Lean and Portable Automatic Text Summarization. Hassel, M. School of Computer Science and Communication, KTH, ISBN-978-917178-704-0. 2007.
[Thakkar2011] Test Model for Text Categorization and Text Summarization. Khushboo Thakkar, Urmila Shrawankar. International Journal on Computer Science and Engineering (IJCSE). 2011.
[Das2007] A Survey on Automatic Text Summarization. Dipanjan Das Andre F.T. Martins. Language Technologies Institute, Carnegie Mellon University. 2007.
[Mihalcea2004] TextRank: Bringing order into texts.Mihalcea, R., and Tarau, P. InProceedings of EMNLP. 2004.
|