CS298 Proposal
High performance distributed file system based on blockchain
Ajinkya Rajguru (akrajguru@gmail.com)
Advisor: Dr. Chris Pollett
Abstract:
Distributed Filesystem architectures have enabled usage of commodity hardware to store data on a large scale with maximum consistency and availability. Blockchain makes it possible to store information which can never be tampered with and allows for incentivization of a traditional decentralized storage system. The goal of this project is to implement a decentralized filesystem which leverages blockchain to keep a record of all the transactions. Unlike a conventional filesystem viz. GFS [1] or HDFS [2] which uses designated servers owned by their organization to store the data and are governed by a Master server this project aims to remove a single point of failure and makes use of participating user machines to store data. The first task is to implement a distributed hash table which can be used to evenly store data on multiple machines and efficiently track, maintain, and persist the data. Secondly the data needs to be encrypted and easily and readily accessible to a user ensuring its soundness. Finally using smart contracts enables incentives and increases integrity and reliability of storing data by recording all the transactions. Our project will function autonomously by making use of multiple participant machines in the network to store data and monetize unused storage space. It will support basic functionalities of a filesystem which include creation, deletion, reading of a file along with storing it in a hierarchical folder like structure.
Keywords - Distributed Filesystem, Blockchain, Chord, DHT, Smart Contracts
CS297 Results:
1. Implemented a simple CHORD [4] distributed hash table.
2. Carried out a comparative study between existing decentralized storage systems which
make use of blockchain namely IPFS [5], Storj [8], SWARM [7] and SIA [9].
3. Developed a basic structure of the filesystem which could be used along with a
distributed hash table by partitioning the files and maintaining the integrity of data bits
using Merkle trees.
4. Created a simple gambling game using smart contracts on Ethereum blockchain to get a
better understanding of Solidity [12] and Smart contracts.
Schedule:
Week 1:
Jan 30 - Feb 6 | First Week Meeting and Reviewing CS298 Proposal |
Week 2:
Feb 6 - Feb 13 | Start implementing CHORD protocol |
Week 3:
Feb 13 - Feb 20 | Continue with the CHORD implementation |
Week 4:
Feb 20 - Feb 27 | Integrate CHORD into the filesystem structure |
Week 5:
Feb 27 - Mar 6 | Work on the edge cases and introduce encryption of data bits |
Week 6:
Mar 6 - Mar 13 | Ensure end to end working of file storage using CHORD with multiple devices. |
Week 7:
Mar 13 - Mar 20 | Start working on the smart contract |
Week 8:
Mar 20 - Mar 27 | Continue working on the smart contract along with web3 implementation |
Week 9 and 10:
Mar 27 - Apr 10 | Integrate smart contract in the filesystem and ensure working of CHORD along with blockchain |
Week 11:
Apr 10 - Apr 17 | End to end testing of the filesystem using blockchain |
Week 12:
Apr 17 - Apr 24 | Result analysis and start working on the report |
Week 13:
Apr 24 - May 1 | Continue to work on the report and get it reviewed by Chris |
Week 14:
May 1 - May 8 | Finish report, start working on slides and get it reviewed by the project committee |
Week 15:
May 8 - May 15 | Finish Slides for presentation |
Week 16:
May 15 - May 22 | Wrap up all work |
Deliverables:
1. Successful storage of small (A few KB) and large (A few GB) files into the
filesystem without having a single point of failure using a DHT (Preferably
CHORD). Storage in folder structure and easy deletion as required.
2. Quick retrieval of files stored in the filesystem while ensuring the soundness of
the data using a content ID which is generated during data storage in the
filesystem.
3. Easy access to an Incentivization layer available for renting or providing storage
space to any user using cryptocurrency
4. Extra layer of data integrity by making every user accountable for data stored on
his machine by recording transactions on blockchain.
5. CS298 report.
Innovations and Challenges
1. Conventionally we store our data on commodity servers owned by organizations like Google, Amazon, Microsoft etc., but our filesystem will allow any user with a personal device to earn some extra bucks by renting his unused storage space to other users using our system.
2. Fully autonomous filesystem with no master service to govern the control or the data plane.
3. It will be crucial to have error handling and covering majority of the edge cases while storing the data on multiple servers.
4. It will be challenging to introduce replication and mapping of replicas in a self-governed CHORD protocol.
5. It will be important to make sure that only a limited amount of gas is spent by each user while recording transactions on the blockchain.
References:
[1] [2020] When Blockchain Meets Distributed File Systems: An Overview, Challenges, and Open Issues. H. Huang, J. Lin, B. Zheng, Z. Zheng and J. Bian. IEEE Access, vol. 8, pp. 50574-50586, 2020, doi: 10.1109/ACCESS.2020.2979881.
[2] https://ethereum.org/en/developers/
[3] [2020] A Detailed and Real-Time Performance Monitoring Framework for Blockchain Systems. P. Zheng, Z. Zheng, X. Luo, X. Chen and X. Liu, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), 2018, pp. 134-143.
[4] [2021] Blockchain smart contracts: Applications, challenges, and future trends. Khan, S.N., Loukil, F., Ghedira-Guegan, C. et al. Peer-to-Peer Netw. Appl. 14, 2901–2925 (2021). https://doi-org.libaccess.sjlibrary.org/10.1007/s12083-021-01127-0
[5] [2003] Stoica et al., "Chord: a scalable peer-to-peer lookup protocol for Internet applications," in IEEE/ACM Transactions on Networking, vol. 11, no. 1, pp. 17-32, Feb. 2003, doi: 10.1109/TNET.2002.808407. |