Chris Pollett> CS157b
( Print View )

Student Corner:
[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

HW#3 --- last modified March 28 2023 12:46:21.

Solution set.

Due date: Apr 5

Files to be submitted:
Hw3.zip

Purpose: To gain experience with query cost evaluation and query plan evaluation. To learn about database recovery techniques.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO4 -- Tune queries and know how to perform query performance evaluations

CLO5 -- Know database recovery techniques

Specification:

This homework will consist of two parts, a written part and a coding/experiment part. Both parts will be submitted in the Hw3.zip file. The written part should be in a file Hw3.pdf. For the written part of the homework you should write solutions for the following questions. In what you turn in, make sure to write the names and student ids for each group member. For each problem, first copy and paste the question, then write your solution to it beneath it.

For each of the following operations, write an iterator that uses an algorithm described in class to enumerate the output of the following operations: (a) project tuples onto attributes A,B,C, renaming C to D, (b) bag union. Make sure your iterator reads whole blocks at a time, but outputs tuples at time.
If B(R)=B(S)=150,000 and M=6000, what are the disk I/O requirements of: (a) two-pass set intersection using hashing, (b) sort-join from class.
Come up with additional query parsing rules to add to our rules from the Mar 8 Lecture to say what <Relations> and <Tuple> are. Make sure your rules can handle expressions for relations like Employee E.
Consider the table:

Y(c,d) Z(d,e)

T(Y)=1200 T(Z)=1400

V(Y,c)=30 V(Z,d)=90

V(Y,d)=70 V(Z,e)=60

Estimate the sizes of relations that are the results from the following queries: (a) `sigma_{e=95}(Z)`, (b) $Y \bowtie \sigma_{e=95}(Z)$.
Assume A=20,B=30 (here we imagine A and B are blocks that can hold 1 integer) are stored in a DB. Suppose a transaction does the following sequence of operations I(A), I(B), R(B,b), b:= b+5, R(A,a), W(B,b), a:= 2*a + b, W(A,a), O(A), O(B). Show the undo log records needed for this transaction.

For the coding/experiment part of the homework, I want you to see the actually plan a DBMS will choose for various kinds of queries. You can conduct your experiments using the DBMS of your choice from among sqlite, Mysql, Postgres, DB2, and Oracle. So that the grader has something to evaluate I want you to produce a file Experiment.pdf which at its top a write up of your results, followed by transcripts of each of the explain operation I am asking for as well as how you created the input tables. Each of the databases mentioned has a variant on the EXPLAIN command which tells you how the database would evaluate a query. For example,

EXPLAIN SELECT * FROM USERS;

or some variant will tell you how the database would perform the query select * from users; without actually performing the query. To do the experiments you will first need to create a program DataGenerator.java. This program will be run from the command line with a line with the following format:

java DataGenerator start_value num wrap_number1 wrap_number2 txtfile

It should output into txtfile num many rows of three columns. The columns should be space separated. The first column starts with the value start_value and increments one with each row. The second column's value is chosen randomly from between 1 and wrap_number1. Similarly, the third column's value is chosen randomly from between 1 and wrap_number2. As an example,

java DataGenerator 10 20 5 3 data.txt

might output into data.txt the rows:

You should include both the file Experiments.pdf and DataGenerator.java in the Hw3.zip file you submit. Create five tables R1(A,B,C), R2(D,B,C), R3(E,B,C), R4(F,B,C), R5(G,B,C). For each table, each of the three columns should be an integer. You should include the execution of the create tables in Experiments.pdf.

java DataGenerator 0 1000 5 10 R1data.txt
java DataGenerator 1000 1000 5 10 R2data.txt
java DataGenerator 2000 1000 5 10 R3data.txt
java DataGenerator 3000 1000 10 5 R4data.txt
java DataGenerator 4000 1000 10 5 R5data.txt

Use the bulk loader facility of the database you chose to load these five files into their corresponding table. Copy the text or show a screenshot of performing the load operation into Experiments.pdf.

Now consider the join of all five tables with the condition R1.B=R2.B and R1.C=R2.C and R2.B=R3.B and R2.C=R3.C and R3.B=R4.B and R3.C=R4.C and R4.B=R5.B and and R4.C=R5.C and R5.B=1 followed by a projection onto the columns B, C. Express this query in SQL in as a join (and a select onto the appropriate columns) that would look like a left-tree, bushy tree, and right-tree. For each way, use explain to find out how your DBMS would execute the query. For each of these examine the output and determine which of the join algorithms from class it uses. Are all of the plans the same or not? Execute each query using the DBMS's timing feature at least five times, computing the average and standard deviation. Is any particular join operation faster than the others? If you put indexes on columns can you improve the timings? Write up your results in Experiments.pdf. Make sure for each experiment you have a purpose, hypothesis, description of how the experiment was conducted, and a conclusions.

Point Breakdown

Written problems (Each 0 - wrong track, 0.5 - something correct/mostly correct, 1pt fully correct)	5pt
DataGenerator.java works as described	0.5pt
Transcripts of create table and bulk loading in Experiments.pdf	0.5pt
Transcripts of three different ways of writing joins and the explain command results	0.5pt
Determination of algorithm from class correspond to join plans and write-ups with conclusions (0.5pts each)	1.5pts
Timing tests including write-up as specified above (0.5 each query)	1.5pts
Effects of indices and write-up	0.5pts
Total	10pts