CS298 Proposal
Efficient Replication of XML Documents with BLOB data
Preethi Vishwanath (prithari@yahoo.com)
Advisor: Dr. Chris Pollett
Committee Members: Dr. Chris Pollett,
Dr. Robert Chun, Mr. Tuong Truong
(pollett@cs.sjsu.edu) (rchun@cs.sjsu.edu) (tctruong@us.ibm.com)
Abstract:
With the rapid increase in streaming media content, there has been renewed interest
in the delivery mechanism (e.g. Mobile Ad Hoc Network Security, BitTorrent) of these
in a distributed environment. There is a scope for improvement of efficiency for
certain mechanisms of streaming media data delivery. For example, in MANETs
(Mobile Ad Hoc Networks) where nodes are connected to each other through wireless
links with each node acting as a router forwarding data to its peers would cause
substantial amount of wasted bandwidth and data replication at intermediate nodes.
Our project is to create a mechanism to efficiently replicate XML documents and
Binary Large Objects (BLOBs) that these contain across a distributed database network.
For better explanation, let us consider a scenario wherein we have XML documents
consisting of BLOBs represented by playlists. A user accessing the streaming media
content (usually represented as an
XML file) may be interested only in some part of the music content (individual BLOBs),
but is forced to download all of them, since they are part of the same playlist
(XML document). As part of CS297, we have successfully created a distributed
(Java RMI based) infrastructure where multiple machines across a network interact with
each other to agree on a location for a specific BLOB. We will seek to now use the
infrastructure to arrive at a solution for the real-world problem of (a) identifying
the minimal subset of blobs within the set of XML documents (playlists) that need be
replicated such that all users in the network have access to all their requested BLOBs,
and (b) the ideal location for storing this composite set of BLOBs (playlist).
We believe that collectively, the algorithms proposed here that will be implemented
on the infrastructure we developed as part of CS297, have real world relevance,
for e.g. these can be applied in playlist formation, and exchange of BLOBS
like music/video files in emerging devices such as cell phones and MP3 players.
CS297 Results
- Added XML parsing functionality to Postgres
- Implementation of Byzantine algorithm on a distributed system.
- Extending Byzantine Algorithm for multiple BLOBs.
- Collection of information on MANET, Bit Torrent, Google Videos etc.
Proposed Schedule
Week 1-2:
1/25-2/06 | 1/25-2/06 |
Week 3-4:
Calender_Date_2 | Implement a program to perform Byzantine agreement
such that all the voters would agree on the machine, should the BLOB be replaced such that,
voters who are located at machines far away from the original machine can easily access the
same. This would be performed on a token ring model. |
Week 5-6:
2/13-2/20 | Implement the above arrangement on an IP address model |
Week 7-8:
2/20-3/06 | Implement a program which checks frequency of requests for
BLOBs. Mechanisms such as setting timer / counters can be used to check when each BLOB has
been accessed |
Week 9:
3/06-3/21 | Statistics for token ring network and IP address. Creation
of testing framework and perform testing on the implementation performed. |
Week 10-12:
3/21-4/10 | Start writing report |
Week 13:
4/10-4/17 | Submit draft to committee |
Week 14-15:
4/17-5/2 | Prepare for presentation |
Week 16:
5/2 ? 5/4 | Final Presentation |
Key Deliverables:
- Software
- A program which examines all the voters and their distance from the BLOB that they need
to access. Based on the frequency of access and the distance of the voters from the machines
on which the BLOBs are located, all the voters perform Byzantine to come on agreement on which machine the BLOB should be replicated to.
- The above deliverables were performed considering a token ring model. This needs to be
modified to Internet Protocol Model.
- Implement a program which checks frequency of requests for BLOBs. In case of a distributed network scenario, there could be various users situated across various machines requesting for certain BLOBs. Few of them could be more commonly in demand in comparison to others. The attempt of this deliverable is to come up with an algorithm which would check how frequently each BLOB is accessed by various machines across the network and determine which the most commonly accessed ones are.
- Obtain statistics for the above performed work. Deliver a testing framework for testing above
deliverable work performed. A rough model of the framework would be a group, consisting of multiple
members collaborating together to make lets say a movie; wherein data is exchanged via cell phones.
- Report
- Detailed description of the architecture of software
- Detailed description of results obtained during the course of the semester
Innovations and Challenges
- Deciding an algorithm to check which frequency is good enough for a BLOB to be considered important
enough requires certain amount of analyses and research.
- Designing an algorithm to implement Byzantine across an IP address model is challenging. One of the
approache would be to take each quad of the IP address and perform Byzantine on the same. Each following
quad would depend on the address which has been obtained in the previous operations.
- Since this is a distributed system, designing a mechanism to achieve synchronization between voters
is not an easy task.
- Creating a testing framework across a distributed network would be a challenging task.
References:
[Hall1999] Principles of Distributed Database Systems (2nd edition). M. Tamer Ozsu and Patrick
Valduriez. Prentice Hall. 1999
[Serge2003] Dynamic XML Documents with distribution and replication. Serge Abiteboul, Angela Bonifati,
Gregory Cobena, Ioana Manolescu, Tova Milo. Proceedings of the 2003 ACM SIGMOD international conference
on Management of data. ACM Press. 2003. Pages 527-538
[Alonso2000] Database replication techniques: a three parameter classification. M Wiesmann, F Pedone,
A Schiper, B Kemme, G Alonso. Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems.
IEEE Computer Society. 2000. Page 206
[Hara2004] Dynamic data replication schemes for mobile ad-hoc network based on aperiodic updates.
T. Hara and S.K. Madria, Proc. Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2004),
pp.869-881, 2004.
[Sailhan2003] Cooperative caching in ad hoc networks.F. Sailhan and V. Issarny. Proc. Int'l Conf.
on Mobile Data Management (MDM'03), pp.13-28, 2003.
[Wang2002] Efficient and guaranteed service coverage in partitionable mobile ad-hoc networks.
K. Wang and B. Li. Proc. IEEE Infocom'02, Vol.2, pp.1089-1098, 2002.
|