HW#3 --- last modified March 25 2020 21:35:46.Due date: Mar 26 Files to be submitted: Purpose: To gain experience with query cost evaluation and query plan evaluation. To learn about database recovery techniques. Related Course Outcomes: The main course outcomes covered by this assignment are: CLO4 -- Tune queries and know how to perform query performance evaluations CLO5 -- Know database recovery techniques Specification: This homework will consist of two parts, a written part and a coding/experiment part. Both parts will be submitted in the Hw3.zip file. The written part should be in a file Hw3.pdf. For the written part of the homework you should write solutions for the following questions. In what you turn in, make sure to write the names and student ids for each group member. For each problem, first copy and paste the question, then write your solution to it beneath it.
For the coding/experiment part of the homework, I want you to see the actually plan a DBMS will choose for various kinds of queries. You can conduct your experiments using the DBMS of your choice from among sqlite, Mysql, Postgres, DB2, and Oracle. So that the grader has something to evaluate I want you to produce a file Experiment.pdf which has at its top a write up of your results followed by transcripts of each of the explain operation I am asking for as well as how you created the input tables. Each of the DBMSs mentioned above has a variant on the EXPLAIN command which tells you how the system would evaluate a query. For example, EXPLAIN SELECT * FROM USERS; or some variant will tell you how the database would perform the query select * from users; without actually performing the query. To do the experiments you will first need to create a program DataGenerator.java. This program will be run from the command line with a line with the following format: java DataGenerator start_value num wrap_number txtfile The program should output into txtfile num many rows of two columns, space separated. The first column starts with the value start_value and increments one with each row. The second column's value is a string of `a`'s of length cycling (via mod function) between 1 and wrap_number. If wrap_number > 10, this then wrap_number is set to 10. For example, filling in these values we might write: java DataGenerator 10 20 5 data.txt The program might output into data.txt the rows: 10 a 11 aa 12 aaa 13 aaaa 14 aaaaa 15 a 16 aa 17 aaa 18 aaaa 19 aaaaa 20 a 21 aa 22 aaa 23 aaaa 24 aaaaa 25 a 26 aa 27 aaa 28 aaaa 29 aaaaa You should include both the file Experiment.pdf and DataGenerator.java in the Hw3.zip file you submit. Create five tables R1(A,B), R2(C,B), R3(D,B), R4(E,B), R5(F,B). For each, the first column should be an integer, the second a VARCHAR(10) or CHAR(10). You should include the execution of the create tables in Experiment.pdf. Next generate five tables worth of data with the following lines: java DataGenerator 0 1000 5 R1data.txt java DataGenerator 1000 1000 5 R2data.txt java DataGenerator 2000 1000 5 R3data.txt java DataGenerator 3000 1000 10 R4data.txt java DataGenerator 4000 1000 10 R5data.txt Use the bulk loader facility of the database you chose to load these five files into their corresponding table. Copy the text or show a screenshot of performing the load operation into Experiment.pdf. Now consider the join of all five tables with the condition R1.B=R2.B and R2.B=R3.B and R3.B=R4.B and R4.B=R5.B and R5.B='aaaa'. Express this query in SQL in as a join that would look like a left-tree, bushy tree, and right-tree. For each way, use explain to find out how your DBMS would execute the query. Also, see if the answer changes when switching from using VARCHAR's for the second column versus CHAR's. Put all of this information into Experiment.pdf. Try executing each of your equivalent queries and check that the results match. Write up, again in Experiment.pdf, which you think is the best way to do the query. If you desire you can also experiment with indexes to see if it helps the execution speed. Point Breakdown
|