CS297 Proposal
Text Summarization
Youn, Kim (youn.kim@students.sjsu.edu)
Advisor: Dr. Chris Pollett
Description:
With the flourish of internet, one can easily get overwhelmed by the flood of text information.
No one wants to waste time on reading whole long texts on a web page. By summarizing a web page,
one can quickly read an accurate representation of the content of a web page by providing a non redundant extract from the original
and save time by knowing in brief what is worth reading and what is not worth reading.
Thus, an efficient text summarizer is needed. All the major search engines(Google, Yahoo, etc) make use of
automated text summarization and present condensed descriptions of the search results.
My project also deals with text summarization. For CS 297, we will make sample programs that uses the Lanczos algorithm for performing
SVD(Singular Value Decomposition), which allows us to find decompositions of very large sparse matrices. The idea is to get familiar with the Lanczos algorithm.
For CS 298/299, we will make such a text summarizer in pure SQL(Structured Query Language).
Schedule:
Week 1:
Jan.25-Jan.29 | Write 297 proposal |
Week 2:
Feb.1-Feb.5 | Read about eigenvalue and eigenvector |
Week 3:
Feb.8-Feb.12 | Deliverable 1 due |
Week 4:
Feb.15-Feb.19 | Read about Singular Vector Decomposition |
Week 5:
Feb.22-Feb.26 | Read about Lanczos algorithm |
Week 6:
Feb.29-Mar.5 | Work on deliverable 2 |
Week 7:
Mar.8-Mar.12 | Work on deliverable 2 |
Week 8:
Mar.15-Mar.19 | Deliverable 2 due |
Week 9:
Mar.22-Mar.26 | Read about Text Summarization |
Week 10:
Mar.29-Apr.2 | Work on Delierable 3 |
Week 11:
Apr.5-Apr.9 | Deliverable 3 due |
Week 12:
Apr.12-Apr.16 | Read about SQL |
Week 13:
Apr.19-Apr.23 | Work on delierable 4 |
Week 14:
Apr.26-May.2 | Deliverable 4 due |
Week 15:
May.10-May.14 | Write report |
Week 16:
May.17-May.21 | Deliverable 5 due |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. Sample program that computes SVD using brute force
2. Sample program that generates a tridiagonal matrix using Lanczos algorithm
3. Sample program that computes SVD using Lanczos algorithm
4. Sample program that extracts summary from original contents using Lanczos algorithm
5. CS 297 Report
References:
Wiki, online at http://en.wikipedia.org/wiki/Lanczos_algorithm
[2005] A Note on Singular Value Decomposition. Sang Min Oh. College of Computing, Georgia Institute of Technology. 2005.
[2002] A Singularly Valuable Decomposition: The SVD of a Matrix. The American University. 2002.
[2004] Clustered SVD strategies in latent semantic indexing. Laboratory for High Performance Scientific Computing and Computer Simulation, Department
of Computer Science, University of Kentucky. Jing Gao, Jun Zhang.
|