CS297 Proposal

High performance document store implementation in Rust

Ishaan Aggarwal (ishaan.aggarwal@sjsu.edu)

Advisor: Dr. Chris Pollett


The aim of this project is to implement a high performance document store which is robust and memory efficient as well. This will be achieved while migrating the older PHP based implementation of data storage for the Yioop! open source search engine to RUST based implementation. The reason for choosing Rust is that it allows more efficient memory management, easier maintenance, robustness and faster performance.


Week 1: 02-02-2021 - 02-09-2021Overview of the project topic, Finalize the deliverables, Drafting the proposal
Week 2: 02-09-2021 - 02-16-2021Environment setup, Rust hands-on, and study relevant research papers
Week 3: 02-16-2021 - 02-23-2021Begin work on Deliverable 1 and study relevant research papers for Deliverable 3
Week 4: 02-23-2021 - 03-02-2021Finish Deliverable 1
Week 5: 03-02-2021 - 03-09-2021Begin work on Deliverable 2
Week 6: 03-09-2021 - 03-16-2021Continue work on Deliverable 2 and study relevant research papers for Deliverable 4
Week 7: 03-16-2021 - 03-23-2021Finish Deliverable 2
Week 8: 03-23-2021 - 03-30-2021Begin work on Deliverable 3
Week 9: 03-30-2021 - 04-06-2021Spring Break
Week 10: 04-06-2021 - 04-13-2021Continue work on Deliverable 3 and prepare slides on findings for deliverable 4
Week 11: 04-13-2021 - 04-20-2021Finish Deliverable 3
Week 12: 04-20-2021 - 04-27-2021Begin work on Deliverable 4
Week 13: 04-27-2021 - 05-04-2021Continue work on Deliverable 4 and look-ahead to final report
Week 14: 05-04-2021 - 05-11-2021Finish Deliverable 4 and start final CS297 report
Week 15: 05-11-2021 - 05-18-2021Finals Week - Report due


The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. A single node server that can receive requests for a document by key and return the corresponding document.

2 Implement linear hashing using rust. This will be leveraged in Deliverable 4.

3. Read and write documents from/to warc files.

4. Implement the key-value store using consistent hashing.

5. TBD - Migrate some portion of the PHP code to RUST.


[1] [2012] Corbett, J., Dean, J., Epstein, M. et al. Spanner: Google's globally distributed database. In Proceedings of OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October 2012.

[2] [2019] Khan, S., Liu, X., Ali, S. A., and Alam, M. (2019). Storage solutions for big data systems: A qualitative study and comparison. arXiv preprint arXiv:1904.11498.

[3] [2020] Okazaki, S. (2020). An experimental study of memory management in Rust programming for big data processing (Doctoral dissertation, Boston University).

[4] [2021] Rust programming best practices with examples: https://github.com/mre/idiomatic-rust

[5] [2018] Gjengset, J., Schwarzkopf, M., Behrens, J., Araujo, L. T., Ek, M., Kohler, E., ... and Morris, R. (2018). Noria: dynamic, partially-stateful data-flow for high-performance web applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18) (pp. 213-231).