CS 299 End of Fall 2015 Semester Summary

At the end of my first CS299 semester, it turned out I bit off more than I could chew. It was to use the Lanczos algorithm to generate the summaries for deliverable 3. During the experiment we found out when the Lanczos algorithm calculates its Singular Vector Decomposition (SVD) it inherently suffers from instances where its calculations are numerically instable. In other words, slight alterations in the text it is summarizing, causes its calculations to be generate numbers that converge to infinity rendering the algorithm unusable.

The code written by Youn Kim, a former Master’s student of Dr. Pollett, was written in Java. I was to take his Java code, refactor it to PHP and incorporate it into Dr. Pollett’s Yioop search engine. It was not until, I converted the code we found out that the Java code and the PHP code both suffer from the numerical instability. Youn Kim’s Java code faithfully followed Lanczos’s approach to text summarization and that is where we ended the semester. At this point we have two choices; implement another method that does not suffer from the numerical instability problem or call this experiment a bust and use a different method. If we cannot find a method that is numerically stable, I will write up the deliverable 3 bust report and move onto experimenting with compressing the sentences. The goal of compressing the sentences is to summarize each sentence with minimal information loss. In other words, after we have summarized the document, we would summarize each sentence also. We would be able to put more in the summary that is meaningful to the reader.