#
CS298 Proposal

#
Text Summarization using SVD (Singular Value Decomposition) and Lanczos Algorithm

Youn, Kim (youn.kim@students.sjsu.edu)

**Advisor:** Dr. Chris Pollett

**Committee Members:** Dr. Jon Pearce(pearce@cs.sjsu.edu) and Dr. Tseng, Chris(tseng@cs.sjsu.edu)

## Abstract:

When browsing the Internet, one can easily get overwhelmed by the flood of text available. No one wants to waste time reading long texts on a web page.
By summarizing a website, one can quickly read an accurate representation of its contents and save time by knowing what is worth reading and what is not.
For this reason, an efficient text summarizer is needed that will provide a non-redundant extract from the original site.
All the major search engines (Google, Yahoo, etc.) use automated text summarization and present condensed descriptions of the search results.
My project also deals with text summarization. Though many summarization methods exist, I will make use of the Lanczos algorithm for the computation of a few eigenvalues and eigenvectors of a large sparse matrix and SVD (Singular Value Decomposition), taking a high-dimensional set of data and reducing it to a lower-dimensional set of data.
This makes it possible to identify the best approximation of the original text by using the information acquired from the Lanczos algorithm earlier.

For CS 298, I will extend the deliverables I developed in CS 297 to create a program that generates a text summary of a web site.
Since I will deal with large data sets of words and sentences from different pages in a web site, it is natural that I will use a database.
Thus I will create SQL scripts that will calculate SVD using the Lanczos algorithm, and will be able to extract a summary from a web site.

##
CS297 Results

- I learned how to calculate SVD, eigenvalues, and eigenvectors.
- I learned how to convert a symmetric matrix into a tridiagonal matrix by implementing the Lanczos algorithm.
- I learned to incorporate the Lanczos algorithm into calculating SVD and different techniques to make my implementation more stable.
- I learned how all of the previous deliverables could be put together to reach my final objective-text summarization.

##
Proposed Schedule

[1983] Golub, G. H. and C. F. Van Loan, Matrix Computations, The John Hopkins University Press, Baltimore, 1983.

[2002] Cullum, J., and Willoughby, R. (2002). Lanczos Algorithms for Large Symmetric Eigenvalue Computations Vol.1: Theory. SIAM, Philadelphia.