Chris Pollett > Students > Charles
[Bio] [Blog] [Deliverable #1: Compare Basic Summarizer to Centroid-Based Summarizer using ROUGE] [Deliverable #2: Create a Dutch Stemmer for the Yioop Search Engine] [Deliverable #3: Create a New Summarizer for the Yioop Search Engine] [Deliverable #4: Term Frequency Weighting in the Centroid-Based Summarizer] [Deliverable #1: Test Yioop Summarizers Against a Large Data Set] [Deliverable #2: Improve the ROUGE Results for Dr. Pollett's Summarization Algorithm] [CS 299 End of Fall 2015 Semester Summary] [Deliverable #3: A Numerically Stable Lanczos Text Summarization Algorithm] [Deliverable #4: Improving Text Summarization using Automatic Sentence Compression] |
CS297 ProposalExperiments with and Implementation of a Context Sensitive Text SummarizerCharles Bocage (email) Advisor: Dr. Chris Pollett Description: This project is focused on the text summarization. Text summarization is the ability to obtain the key ideas from a text passage using as little words as possible. Dr. Pollett has a search engine, Yioop, that uses a centroid-based summarizer (CBS) to summarize its crawled documents. A CBS hinges on using a centroid (a set of words that are statistically important to the document) to get the main idea for the document. After that it computes the text frequencies and cosine similarity to build the summary. This project will attempt to improve the existing CBS by weighting the sentences based on their location in the content. For example, if the sentence is within a H1 tag, it will have a more signifcant weight versus a sentence in a H2 tag. The results from the current CBS and the improved version will be compared using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) software package, which as of now is the gold standard for calculating summarization metrics. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Clone the GIT Seek Quarry repository, perform a simple crawl and test its summarizers. 2. Implement a Dutch summarizer that can be used in the Yioop search engine. 3. Find a paper on the summarization topic that contains an algorithm and code it. 4. Implement Dr. Pollett's summmarizer algorithm. 5. Complete the CS 297 report. References: [Shen2004] Web-page classification through summarization. Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma. ACM New York, NY, USA. 2004 |