Chris Pollett >
Students > [Bio] [Del2-Presentation on Web Crawlers-Nutch-PPT] [Del2-Implementation of Nutch Crawl] [Del3-Code Obfuscation Techniques-PPT] [CS298 Presentation Slides-PDF] |
CS297 ProposalBookmarklet Builder for Offline Data RetrievalSheetal Naidu (sheetalnaidu@yahoo.com) Advisor: Dr. Chris Pollett Description: The goal for this project is to develop a tool that can save entire web page applications as bookmarklets. This will enable users to use these applications even when they are not connected to the Internet. The main technology beyond Javascript needed to do this is the data: URI scheme. This enable images, Flash, applets, PDFs, etc. to be directly embedded as base64 encoded text within a web page. This URI scheme is supported by all major browsers other than Internet explorer. Our program will obfuscate the actual resulting JavaScript so these complete applications could potentially be sold without easily being reverse engineered. The application will be made available online, to users who are typically website owners and would like to allow thier users to be able to use the applications offline. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Write simple JavaScript to read from HTML file and enhance its appearance. 2. Study and understand how web crawlers browse the World Wide Web. Case study: Nutch. Use Nutch to crawl sample website. 3. Study different code obfuscation techniques and pick a technique to use in our project. 4. Write PHP code to convert web applications with text to the data:URI scheme. 5. CS297 Proposal report References: [2004] JavaScript Bible. Danny Goodman with Michael Morrison. Wiley. 2004. [2006] JavaScript : The Definitive Guide. David Flanagan. O'Reilly. 2006. [2006] Programming PHP. Rasmus Lerdorf, Kevin Tatroe, and Peter MacIntyre. O'Reilly. 2006. [2005] RFC 3986. Uniform Resource Identifier (URI): Generic Syntax. Network Working Group. "http://gbiv.com/protocols/uri/rfc/rfc3986.html" [2007] Official page of Nutch project. "http://lucene.apache.org/nutch/" |