Chris Pollett > Students > Mangesh

    (Print View)

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [CS297 Presentation-PDF]

    [Text Summarizer based on Intersection method]

    [Text Summarizer based on Centroid method]

    [Text Summarizer based on TF-ISF method]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [CS298 Presentation-PDF]

    [CS298 Report-PDF]

    [Graduation Photo]

                          

























CS297 Proposal

Text Summarization for Compressed Inverted Indexes and snippets

Mangesh Dahale (mangeshadahale@gmail.com)

Advisor: Dr. Chris Pollett

Description:

The major challenge in summarization lies in distinguishing the more informative parts of a document from the less ones. Text Summarization is a technique to generate a concise summary of a large text. In search engines, Text summarization can be used for generating compressed description of the web pages.For Indexing, these can be used rather than whole pages when building inverted indexes. For query results, summaries can be used for snippet generation. In this project, I will try to design and implement new summarization techniques for Yioop search engine. In the initial stage of the project, I will be implementing three different methods of text summarization and finding the best method for Yioop by evaluating their performances.

Schedule:

Week 1: Aug.26-Sep.1Read about Text Summarization from [Soumya2011] and [Das2007]
Week 2,3: Sep.2-15Deliverable#1: Study different methods used for text summarization
Week 4,5: Sep.16-29Code a summarizer using intersection function.
Week 6,7: Sep.30-Oct.13Code a summarizer using centroid algorithm.
Week 8,9: Oct.14-27Code a summarizer using TF-ISF algorithm.
Week 10: Oct.28-Nov.3Deliverable#2 : Code three methods so that we can evaluate their performances.
Week 11: Nov.4-10Create document samples on which three methods will be applied to create summary
Week 12,13: Nov.11-24Deliverable#3 : Evaluate performance of these three methods to find best summarization method
Week 14: Nov.25-Dec.1Decide which method we can implement in Yioop search engine
Week 15: Dec.2-8Deliverable#4 : CS297 final report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Study different methods of Text Summarization.

2. Code three methods so that we can evaluate their performances.

3. Evaluate performance of these three methods to find best summarization method.

4. CS297 final report.

References:

[Soumya2011] Automatic text summarization. Soumya, S. , Kumar, G. , Naseem, R. , and Mohan, S. Computational Intelligence and Information Technology. 2011.

[Hassel2007] Resource Lean and Portable Automatic Text Summarization. Hassel, M. School of Computer Science and Communication, KTH, ISBN-978-917178-704-0. 2007.

[Thakkar2011] Test Model for Text Categorization and Text Summarization. Khushboo Thakkar, Urmila Shrawankar. International Journal on Computer Science and Engineering (IJCSE). 2011.

[Das2007] A Survey on Automatic Text Summarization. Dipanjan Das Andre F.T. Martins. Language Technologies Institute, Carnegie Mellon University. 2007.

[Mihalcea2004] TextRank: Bringing order into texts.Mihalcea, R., and Tarau, P. InProceedings of EMNLP. 2004.