Chris Pollett > Students >
Khang

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Relational XML-PPT]

    [tsearch2-PPT]

    [ProcessXML-PPT]

    [Del1]

    [Del2]

    [CS297Report-PDF]

    [CS298Proposal]

                          

























CS298 Proposal

Enhancing XML Support in PostgreSQL

Khang Nguyen (KhangNg@yahoo.com)

Advisor: Dr. Chris Pollett

Committee Members: Dr. Jon Pearce and Dr. Robert Chun.

Abstract:

PostgreSQL is a database management system derived from the POSTGRES system developed at UC Berkeley. PostgreSQL is one of most popular open source databases available today. Currently, there are tools available to import XML data into PostgreSQL tables. However, PostgreSQL does not support XML data natively. The goal of this project is to extend PostgreSQL to natively store XML data in GiST structures (Generalized Search Tree) and to implement some recent algorithms to allow for efficient XPath based retrieval of the XML data stored in the GiST structures. At the end we will perform some timing tests to compare the performance of the new methods of storing and retrieving XML data in tree-like structures against the traditional methods implemented in relational databases.

CS297 Results

  • Deliverable 1 was a Java program, which processed Dr. Pollett's XML file containing digital library document records. The program removed invalid XML characters and then imported these document records into the parent and child tables residing in a PostgreSQL database. The program maintained the referential integrity of data between the parent and child tables by utilizing primary and foreign keys.
  • Deliverable 2 was a web-based program running on Apache HTTP server integrated with PHP scripting language and the PostgreSQL database used in Deliverable 1. The program collected search criteria input from users and searched it against the PostgreSQL database and display the results.
  • Deliverable 3 was to get familiar with GiST indexing schemes by implementing of a set of functions dealing with tetrahedron objects.
  • CS297 Report.

Proposed Schedule

Week 1: 08/27/06-09/02/06Write up and submit CS298 Proposal
Week 2: 09/03/06-09/09/06Read Ch5 of [KD06] (PostgreSQL Programming). Code deliverable #1
Week 3: 09/10/06-09/16/06Read Ch6 of [KD06] (Extending PostgreSQL). Code deliverable #1
Week 4: 09/17/06-09/23/06Code deliverable #1
Week 5: 09/24/06-09/30/06Code deliverable #1
Week 6: 10/01/06-10/07/06Code deliverable #1
Week 7: 10/08/06-10/14/06Read Ch10 of [EC01] (XSLT). Code deliverable #2
Week 8: 10/15/06-10/21/06Read Ch11 of [EC01] (XPath). Code deliverable #2
Week 9: 10/22/06-10/28/06Code deliverable #2
Week 10: 10/29/06-11/04/06Code deliverable #2
Week 11: 11/05/06-11/11/06Write Final Report (deliverable #4)
Week 12: 11/12/06-11/18/06Write Final Report (deliverable #4)
Week 13: 11/19/06-11/25/06Submit Final Report at the beginning of the week. Code & execute deliverable #3.
Week 14: 11/20/06-12/02/06Code & execute deliverable #3.
Week 15: 12/03/06-12/09/06Defense CS298 Project

Key Deliverables:

  • Software
    • 1. Developing a set of functions to store XML data utilizing the GiST indexing schemes of PostgreSQL database.
    • 2. Developing a set of functions to allow XPath-based functions to query against the XML data stored in the PostgreSQL database
    • 3. A set of performance software tests to compare the new methods with the traditional methods of storing and retrieving XML data.
  • Report
    • 4. CS298 Report
    • 4. Documentation for all code produced

Innovations and Challenges

  • The project extends a large open source database in a nontrivial way.
  • Changing PostgreSQL database to store XML data natively in tree like structures is innovative.
  • Changing PostgreSQL database to support XPath-based queries is challenging.
  • The knowledge required to implement the new features is diverse.

References:

  1. [KD06] Korry Douglas, Susan Douglas. PostgreSQL. Sams Publishing. 2006.
  2. [PDG05] PostgreSQL Development Group. PostgreSQL 8.1.0 Documentation. 2005.
  3. [SL02] Shiyong Lu, Yezhou Sun, Mustafa Atay, Farshad Fotouhi. A New Inlining Algorithm for Mapping XML DTDs to Relational Schemas. ANSI. 2002. http://wwwedit.cs.wayne.edu:8080/~shiyong/papers/xsdm02.pdf
  4. [IT02] Igor Tatarinov, Statis D. Viglas. Storing and Querying Ordered XML Using a Relational Database System. ACM SIGMOD. 2002.
  5. [EC01] Elizabeth Castro. XML For The World Wide Web. PitchPit Press. 2001.