CS297 Proposal

Evaluating the performance of NoSQL and Time Series databases using TSBS

Aarsh Patel (aarsh.patel@sjsu.edu)

Advisor: Dr. Chris Pollett

Description:

Time series are measurements or events that are tracked, monitored, down-sampled, and aggregated over time. There are many time series databases developed with a focus on storing such time series data. But, many traditional NoSQL databases like MongoDB and Cassandra can also be used for storing time series data.A recent open source time series data benchmarking suite has been developed(and still many features and updates are added periodically) called Time Series Benchmarking Suite (TSBS) that supports many time series and NoSQL databases. The goal of the project is to evaluate the performance of 4 databases (3 Time Series and MongoDB) against various queries. Metrics like data storage footprint, and read and write performance of databases will be the base of the research question as to how traditional NoSQL databases perform against time series databases when it comes to storing time series data.

Schedule:

Week 1: Jan 31 - Feb 7Finalize project topic and deliverables
Week 2: Feb 7 - Feb 14Work on Deliverable#1.Read[1]
Week 3: Feb 14 - Feb 21Complete and summarize Deliverable#1
Week 4: Feb 21 - Feb 28Start working on deliverable #2
Week 5: Feb 28 - Mar 7Research benchmarking tools/suites
Week 6: Mar 7 - Mar 14Finalize the benchmarking suite and the databases to be evaluated.Read[2]
Week 7: Mar 14 - Mar 21Complete and summarize Deliverable#2
Week 8: Mar 21 - Mar 28Start working on deliverable #3.Read[3][4]
Week 9: Mar 28 - Apr 4Understanding the benchmarking tool (data generation ,data loading)Read [5][6]
Week 10: Apr 4 - Apr 11Summarize Deliverable#3
Week 11: Apr 11 - Apr 18Start working on deliverable#4
Week 12: Apr 18 - Apr 25Setting up the environment.Downloading the databases in the system and finalizing the queries .
Week 13: Apr 25 - May 2Creating a small dataset and implementing the benchmarking.
Week 14: May 2 - May 9Complete and summarize Deliverable#4
Week 15: May 9 - May 16Finalize the Report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. A research study on time series data, workload and their advantages.

2. Finalizing the benchmarking suite and the databases.

3. Research on the databases, studying the benchmark code and Go programs.

4. Implementation of the benchmark locally with a small dataset with decided databases and queries.

5. CS 297 report.

References:

[1] S. N. Z. Naqvi, S. Yfantidou, and E. Zimanyi, “Time series databases and influxdb.” [Online]. Available: https://jira.lsstcorp.org/secure/attachment/37574/influxdb_2017.pdf. [Accessed: 10-Mar-2023].

[2] “DB-Engines ranking,” DB-Engines. [Online]. Available: https://db-engines.com/en/ranking/time+series+dbms/all. [Accessed: 10-Mar-2023].

[3] J. Han, H. E, G. Le, and J. Du, “Survey on NoSQL database,” 2011 6th International Conference on Pervasive Computing and Applications, pp. 363–366, 2011.

[4] V. Abramova and J. Bernardino, “NoSQL databases: MongoDB vs Cassandra ,” Proceedings of the International C* Conference on Computer Science and Software Engineering, pp. 14–22, 2013.

[5] D. Paul“Time Series Database (TSDB) guide: Influxdb,” InfluxData, 09-Feb-2023. [Online]. Available: https://www.influxdata.com/time-series-database/. [Accessed: 10-Mar-2023].

[6] “Timescale docs,” TimescaleDB - Timeseries database for PostgreSQL. [Online]. Available: https://docs.timescale.com/. [Accessed: 10-Mar-2023].