Chris Pollett > Students > Charles

    Print View



    [CS297 Proposal]

    [Deliverable #1: Compare Basic Summarizer to Centroid-Based Summarizer using ROUGE]

    [Deliverable #2: Create a Dutch Stemmer for the Yioop Search Engine]

    [Deliverable #3: Create a New Summarizer for the Yioop Search Engine]

    [Deliverable #4: Term Frequency Weighting in the Centroid-Based Summarizer]

    [CS297 Report]

    [CS299 Proposal]

    [Deliverable #1: Test Yioop Summarizers Against a Large Data Set]

    [Deliverable #2: Improve the ROUGE Results for Dr. Pollett's Summarization Algorithm]

    [CS 299 End of Fall 2015 Semester Summary]

    [Deliverable #3: A Numerically Stable Lanczos Text Summarization Algorithm]

    [Deliverable #4: Improving Text Summarization using Automatic Sentence Compression]

    [CS299 Presentation]

    [CS299 Report]


CS 299 End of Fall 2015 Semester Summary

At the end of my first CS299 semester, it turned out I bit off more than I could chew. It was to use the Lanczos algorithm to generate the summaries for deliverable 3. During the experiment we found out when the Lanczos algorithm calculates its Singular Vector Decomposition (SVD) it inherently suffers from instances where its calculations are numerically instable. In other words, slight alterations in the text it is summarizing, causes its calculations to be generate numbers that converge to infinity rendering the algorithm unusable.

The code written by Youn Kim, a former Master’s student of Dr. Pollett, was written in Java. I was to take his Java code, refactor it to PHP and incorporate it into Dr. Pollett’s Yioop search engine. It was not until, I converted the code we found out that the Java code and the PHP code both suffer from the numerical instability. Youn Kim’s Java code faithfully followed Lanczos’s approach to text summarization and that is where we ended the semester. At this point we have two choices; implement another method that does not suffer from the numerical instability problem or call this experiment a bust and use a different method. If we cannot find a method that is numerically stable, I will write up the deliverable 3 bust report and move onto experimenting with compressing the sentences. The goal of compressing the sentences is to summarize each sentence with minimal information loss. In other words, after we have summarized the document, we would summarize each sentence also. We would be able to put more in the summary that is meaningful to the reader.