CS297 Proposal
Enhancing Viewability of Images of Text in PDF in Mobile Devices
Long N Vuong (longnvuong@yahoo.com)
Advisor: Dr. Chris Pollett
Description:
In mobile devices, standard PDF readers such as Adobe Reader
for Mobile Devices enables one to view PDF files provided they contains mainly
text and small images.
However, if we have a pdf file whose contents contain larger images, equations,
text scanned to a large image, etc. then these readers just display the whole image. This can
be especially bad if the image is larger than the screen size itself as one has
to scroll both vertically and horizontally to try to understand the document.
In this project, we will develop a reader which solves the
problem of text scanned to large images and which solves the problem
of displaying equations. Our reader will detect words and equations in large images using the
surrounding white space. Then it will extract these to smaller images which can then be
flowed to small screen. Our system should be robust as it only need to detect white space
rather than do fancier techniques like optical character recognition, which might be
hard in the case of math equations, handwriting, or nonstandard scripts.
Schedule:
Week 1:
Aug 29-Sep 4 | Read Book "Core J2ME Technology" part 1 and 2 |
Week 2:
Sep 5-11 | Work on Deliverable 1 |
Week 3:
Sep 12-18 | Deliverable 1 Due |
Week 4:
Sep 19-25 | Read "Acrobat and PDF Library API Reference" and "Acrobat SDK User's Guide" |
Week 5:
Sep 26-Oct 2 | Work on Deliverable 2 |
Week 6:
Oct 3-9 | Deliverable 2 Due |
Week 7:
Oct 10-16 | Read Book "Practical Algorithms for Image Analysis: Descriptions, Examples, and Code" |
Week 8:
Oct 17-23 | Work on Deliverable 3 |
Week 9:
Oct 24-30 | Deliverable 3 Due |
Week 10:
Oct 31-Nov 6 | Read Book "Document Image Analysis" |
Week 11:
Nov 7-13 | Read "Find Text in Images" and "Extraction of Text from Images" |
Week 12:
Nov 14-20 | Work on Deliverable 4 |
Week 13:
Nov 21-27 | Deliverable 4 Due |
Week 14:
Nov 28-Dec 4 | Work on CS 297 Report |
Week 15:
Dec 5-11 | CS 297 Report Due |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. Write a simple "Hello World" J2ME program.
2. Write a program which extracts text from PDF files and convert each word to JPG image. The text still keeps its font and format.
3. Write a program which extracts images from PDF files and display them.
4. Write a program which extract words form a JPG image as separate images.
5. CS297 report.
References:
[2000] Practical Algorithms for Image Analysis: Descriptions, Examples, and Code. Michael Seul, Lawrence O'Gorman, Michael J. Sammon. Cambridge University Press. April 15, 2000.
[1994] Document Image Analysis. H. Bunke, P. S. P. Wang, Henry S. Baird. World Scientific Publishing Company. December 1994.
[2005] Acrobat SDK User's Guide. Adobe. http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/intro_to_sdk/UserGuide.pdf.
[2005] Acrobat and PDF Library API Reference. Adobe. http://partners.adobe.com/public/developer/en/acrobat/sdk/pdf/plugins/APIReference.pdf.
[2001] Core J2ME Technology. John W. Muchow. Prentice Hall PTR. December 21, 2001.
[1997] Finding Text In Images. V. Wu, R. Manmatha and E. M. Riseman. http://www.cs.umass.edu/Dienst/UI/2.0/Describe/ncstrl.umassa_cs%2FUM-CS-1997-009.
[2002] Extraction of Text from Images. Pooja Nath. http://www.cse.iitk.ac.in/research/btp2002/98263.html.
|