CS297 Proposal
High performance document store implementation in Rust
Ishaan Aggarwal (ishaan.aggarwal@sjsu.edu)
Advisor: Dr. Chris Pollett
Description:
The aim of this project is to implement a high performance document store which is robust and memory efficient as well. This will be achieved while migrating the older PHP based implementation of data storage for the Yioop! open source search engine to RUST based implementation. The reason for choosing Rust is that it allows more efficient memory management, easier maintenance, robustness and faster performance.
Schedule:
Week 1:
02-02-2021 - 02-09-2021 | Overview of the project topic, Finalize the deliverables, Drafting the proposal |
Week 2:
02-09-2021 - 02-16-2021 | Environment setup, Rust hands-on, and study relevant research papers |
Week 3:
02-16-2021 - 02-23-2021 | Begin work on Deliverable 1 and study relevant research papers for Deliverable 3 |
Week 4:
02-23-2021 - 03-02-2021 | Finish Deliverable 1 |
Week 5:
03-02-2021 - 03-09-2021 | Begin work on Deliverable 2 |
Week 6:
03-09-2021 - 03-16-2021 | Continue work on Deliverable 2 and study relevant research papers for Deliverable 4 |
Week 7:
03-16-2021 - 03-23-2021 | Finish Deliverable 2 |
Week 8:
03-23-2021 - 03-30-2021 | Begin work on Deliverable 3 |
Week 9:
03-30-2021 - 04-06-2021 | Spring Break |
Week 10:
04-06-2021 - 04-13-2021 | Continue work on Deliverable 3 and prepare slides on findings for deliverable 4 |
Week 11:
04-13-2021 - 04-20-2021 | Finish Deliverable 3 |
Week 12:
04-20-2021 - 04-27-2021 | Begin work on Deliverable 4 |
Week 13:
04-27-2021 - 05-04-2021 | Continue work on Deliverable 4 and look-ahead to final report |
Week 14:
05-04-2021 - 05-11-2021 | Finish Deliverable 4 and start final CS297 report |
Week 15:
05-11-2021 - 05-18-2021 | Finals Week - Report due |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. A single node server that can receive requests for a document by key and return the corresponding document.
2 Implement linear hashing using rust. This will be leveraged in Deliverable 4.
3. Read and write documents from/to warc files.
4. Implement the key-value store using consistent hashing.
5. TBD - Migrate some portion of the PHP code to RUST.
References:
[1] [2012] Corbett, J., Dean, J., Epstein, M. et al. Spanner: Google's globally distributed database. In Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October 2012.
[2] [2019] Khan, S., Liu, X., Ali, S. A., and Alam, M. (2019). Storage solutions for big data systems: A qualitative study and comparison. arXiv preprint arXiv:1904.11498.
[3] [2020] Okazaki, S. (2020). An experimental study of memory management in Rust programming for big data processing (Doctoral dissertation, Boston University).
[4] [2021] Rust programming best practices with examples: https://github.com/mre/idiomatic-rust
[5] [2018] Gjengset, J., Schwarzkopf, M., Behrens, J., Araujo, L. T., Ek, M., Kohler, E., ... and Morris, R. (2018). Noria: dynamic, partially-stateful data-flow for high-performance web applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18) (pp. 213-231). |