Enhancing Viewability of Images of Text in PDF in Mobile Devices
Long N Vuong (firstname.lastname@example.org)
Advisor: Dr. Chris Pollett
In mobile devices, standard PDF readers such as Adobe Reader for Mobile Devices enables one to view PDF files provided they contains mainly text and small images. However, if we have a pdf file whose contents contain larger images, equations, text scanned to a large image, etc. then these readers just display the whole image. This can be especially bad if the image is larger than the screen size itself as one has to scroll both vertically and horizontally to try to understand the document. In this project, we will develop a reader which solves the problem of text scanned to large images and which solves the problem of displaying equations. Our reader will detect words and equations in large images using the surrounding white space. Then it will extract these to smaller images which can then be flowed to small screen. Our system should be robust as it only need to detect white space rather than do fancier techniques like optical character recognition, which might be hard in the case of math equations, handwriting, or nonstandard scripts.
The full project will be done when CS298 is completed. The following will be done by the end of CS297:
1. Write a simple "Hello World" J2ME program.
2. Write a program which extracts text from PDF files and convert each word to JPG image. The text still keeps its font and format.
3. Write a program which extracts images from PDF files and display them.
4. Write a program which extract words form a JPG image as separate images.
5. CS297 report.
 Practical Algorithms for Image Analysis: Descriptions, Examples, and Code. Michael Seul, Lawrence O'Gorman, Michael J. Sammon. Cambridge University Press. April 15, 2000.
 Document Image Analysis. H. Bunke, P. S. P. Wang, Henry S. Baird. World Scientific Publishing Company. December 1994.
 Acrobat SDK User's Guide. Adobe. http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/intro_to_sdk/UserGuide.pdf.
 Acrobat and PDF Library API Reference. Adobe. http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/plugins/APIReference.pdf.
 Core J2ME Technology. John W. Muchow. Prentice Hall PTR. December 21, 2001.
 Finding Text In Images. V. Wu, R. Manmatha and E. M. Riseman. http://www.cs.umass.edu/Dienst/UI/2.0/Describe/ncstrl.umassa_cs%2FUM-CS-1997-009.
 Extraction of Text from Images. Pooja Nath. http://www.cse.iitk.ac.in/research/btp2002/98263.html.