CS297 Proposal
Evaluating the performance of NoSQL and Time Series databases using TSBS
Aarsh Patel (aarsh.patel@sjsu.edu)
Advisor: Dr. Chris Pollett
Description:
Time series are measurements or events that are tracked, monitored, down-sampled, and aggregated over time. There are many time series databases developed with a focus on storing such time series data. But, many traditional NoSQL databases like MongoDB and Cassandra can also be used for storing time series data.A recent open source time series data benchmarking suite has been developed(and still many features and updates are added periodically) called Time Series Benchmarking Suite (TSBS) that supports many time series and NoSQL databases. The goal of the project is to evaluate the performance of 4 databases (3 Time Series and MongoDB) against various queries. Metrics like data storage footprint, and read and write performance of databases will be the base of the research question as to how traditional NoSQL databases perform against time series databases when it comes to storing time series data.
Schedule:
Week 1:
Jan 31 - Feb 7 | Finalize project topic and deliverables |
Week 2:
Feb 7 - Feb 14 | Work on Deliverable#1.Read[1] |
Week 3:
Feb 14 - Feb 21 | Complete and summarize Deliverable#1 |
Week 4:
Feb 21 - Feb 28 | Start working on deliverable #2 |
Week 5:
Feb 28 - Mar 7 | Research benchmarking tools/suites |
Week 6:
Mar 7 - Mar 14 | Finalize the benchmarking suite and the databases to be evaluated.Read[2] |
Week 7:
Mar 14 - Mar 21 | Complete and summarize Deliverable#2 |
Week 8:
Mar 21 - Mar 28 | Start working on deliverable #3.Read[3][4] |
Week 9:
Mar 28 - Apr 4 | Understanding the benchmarking tool (data generation ,data loading)Read [5][6] |
Week 10:
Apr 4 - Apr 11 | Summarize Deliverable#3 |
Week 11:
Apr 11 - Apr 18 | Start working on deliverable#4 |
Week 12:
Apr 18 - Apr 25 | Setting up the environment.Downloading the databases in the system and finalizing the queries . |
Week 13:
Apr 25 - May 2 | Creating a small dataset and implementing the benchmarking. |
Week 14:
May 2 - May 9 | Complete and summarize Deliverable#4 |
Week 15:
May 9 - May 16 | Finalize the Report |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. A research study on time series data, workload and their advantages.
2. Finalizing the benchmarking suite and the databases.
3. Research on the databases, studying the benchmark code and Go programs.
4. Implementation of the benchmark locally with a small dataset with decided databases and queries.
5. CS 297 report.
References:
[1] S. N. Z. Naqvi, S. Yfantidou, and E. Zimanyi, “Time series databases and influxdb.” [Online]. Available: https://jira.lsstcorp.org/secure/attachment/37574/influxdb_2017.pdf. [Accessed: 10-Mar-2023].
[2] “DB-Engines ranking,” DB-Engines. [Online]. Available: https://db-engines.com/en/ranking/time+series+dbms/all. [Accessed: 10-Mar-2023].
[3] J. Han, H. E, G. Le, and J. Du, “Survey on NoSQL database,” 2011 6th International Conference on Pervasive Computing and Applications, pp. 363–366, 2011.
[4] V. Abramova and J. Bernardino, “NoSQL databases: MongoDB vs Cassandra ,” Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 14–22, 2013.
[5] D. Paul“Time Series Database (TSDB) guide: Influxdb,” InfluxData, 09-Feb-2023. [Online]. Available: https://www.influxdata.com/time-series-database/. [Accessed: 10-Mar-2023].
[6] “Timescale docs,” TimescaleDB - Timeseries database for PostgreSQL. [Online]. Available: https://docs.timescale.com/. [Accessed: 10-Mar-2023].
|