Chris Pollett> Old Classses >
CS157b

( Print View )

Student Corner:
[Submit Sec2]
[Grades Sec2]

[Online Final-PDF]

[Online Midterm-PDF]

[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

HW#2 --- last modified March 11 2020 06:46:16.

Solution set.

Due date: Mar 6

Files to be submitted:
  Hw2.zip

Purpose:To become more familiar with database record structures, indexes, and B-trees and to experiment with these in a modern DMS

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO1 -- Know common database record formats

CLO2 -- Given an index structure based on a B-tree or extensible hashing be able to figure out the effect of performing an insert or a delete

Specification:

This homework will consist of two parts, a written part and a coding part. Both parts will be submitted in the Hw2.zip file. The written part should be in a file Hw2.pdf. For the written part of the homework you should write solutions for the following questions. In what you turn in, make sure to write the names and student ids for each group member. For each problem, first copy and paste the question, then write your solution to it beneath it.

  1. Draw of the disk state and a memory state and sequence of record accesses such that: (a) automatic swizzling is more efficient (fewer I/Os) than swizzling on demand, (b) swizzling on demand is more efficient than automatic swizzling, (c) no swizzling is more efficient than automatic swizzling, (d) swizzling on demand is more efficient than no swizzling.
  2. Suppose a record has the following fields in this order: An 8 byte integer, character string of length 25, a SQL date, and a 16 bit Binary field. How many bytes does the record take if:
    1. Fields can start at any byte.
    2. Fields must start at a byte that is a multiple of 4.
  3. Suppose we are storing 64 byte records into a data file in our database. Each record has a 4 byte integer key. The data file is sorted by this key. Database blocks are 16384 bytes long. Block pointers are 6 bytes. A record pointer consists of a block pointer followed by two byte number indicating which record offset in the block header is for the record in question. Each block begins with two bytes indicating the number of records in a block, followed by a two byte offset pointing to the first element in the available space list, followed by a list of offsets for each record in the block. Suppose we want to store 150 million records. Calculate showing all work the number of blocks used by the data file and by an index on this data file if:
    1. the index is dense,
    2. the index is sparse.
  4. Create a text transcript of doing the following activities on a Mysql database from the Mysql shell: Create a Mysql table suitable for handling blog posts. Add a full text index on the columns you use for the text contents of your post. Insert a few blog posts. Determine what the current stop words are for your index. Do a search of your table on a word or phrase that is not a stop word, but is in one of the inserted posts. Do a search of your table on a word that is a stop word, but is in one of the inserted posts. Change the stop list to now include your stop word and redo the searches. Please consult Mysql Full Text Stopwords page, for more info about how to do this problem.
  5. For the following B-tree, show:
    1. step-by-step the blocks accessed in looking up 24 (a key not in the tree)
    2. step-by-step inserting a new entry for a record with key 4

    B-Tree for Homework

For the coding part of the assignment, I'd like you to programmatically implement quad-trees in Sqlite using a java program named QuadTreeManager.java. To keep things simple, we will assume our quad trees are used to store triple (x,y, label) data. To test your code, the grader will compile it from the command line with a line like:

javac QuadTreeManager.java

Your code should include thre other classes given by the files QuadPoint.java, QuadRectangle.java, and QuadRecord.java. Here is all the code you need for these classes:

//QuadPoint.java
public class QuadPoint
{
    public float x;
    public float y;
    public QuadPoint(float x, float y)
    {
        this.x = x;
        this.y = y;
    }
    public String toString()
    {
        return "(" + x + "," + y + ")";
    }
}
//QuadRectangle.java
public class QuadRectangle
{
    public int id;
    public QuadPoint top_left;
    public QuadPoint bottom_right;
    public QuadRectangle(int id, QuadPoint top_left, QuadPoint bottom_right)
    {
        this.id = id;
        this.top_left = top_left;
        this.bottom_right = bottom_right;
    }
    public String toString()
    {
        return top_left.toString() + ", " + bottom_right.toString();
    }
}
//QuadRecord.java
public class QuadRecord
{
    public String label;
    public QuadPoint point;
 
    public QuadRecord(String label, QuadPoint point)
    {
        this.label = label;
        this.point = point;
    }
    public QuadRecord(String label, float x, float y)
    {
        this.label = label;
        this.point = new QuadPoint(x, y);
    }
    public String toString()
    {
        return label + point.toString();
    }
}

Your program QuadTreeManager will be run from the command line with a line like:

java QuadTreeManager sqlite_database_name name_of_instruction_file

For example,

java QuadTreeManager test.sqlite instructions.txt

The above command should create a connection to the Sqlite database test.sqlite and then execute each of the instructions in instructions.txt on that database. An instruction file consists of a sequence of lines, each line being an instruction. These can be one of three types:

  1. A create quad tree line which creates a new quad tree. It has the format:
    c file_name low_x low_y high_x high_y
    
    This creates a new quad tree named file_name with rectangle given by the points (low_x, low_y) and (high_x, high_y). All points which we insert into this tree need to be within this rectangle to your code should output an error. Similarly, we require low_x to be less than high_x and low_y to be less than high_y. The following is an example of a create instruction:
    c MyAwesomeTree -5.2 0 10.6 32.7
    
  2. An insert quad tree point which inserts a record into a quad tree. It has the format:
    i file_name label x_value y_value
    
    This instruction should insert into the quad tree file_name (only if it exists) the record given by label x_value y_value. The following is an example of an insert instruction:
    i MyAwesomeTree SpatiallyInteresting -3.1 7.8
    
    This should insert into the quad tree MyAwesomeTree a record with label SpatiallyInteresting at location (-3.1, 7.8).
  3. A lookup quad tree points which looks up the records in the quad tree contained within a rectangle given by two points. It has the format:
    l file_name p1_x p1_y p2_x p2_y limit_offset limit_count
    
    This should first output to standard output (usually print to the terminal) the coordinates of the quad tree rectangles which intersect with the the rectangle given by the pair of points (p1_x, p1_y) and (p2_x p2_y). This portion of the output should be in the format:
    Query intersects with the following rectangles in the quad tree:
    Rect_1:(some_low_x1, some_low_y1), (some_high_x1, some_high_y1)
    ...
    Rect_n:(some_low_xn, some_low_yn), (some_high_xn, some_high_yn)
    
    For example, if we also add to our MyAwesomeTree the points (-3.1, 20) with label "blah", (-2.1, 21) with label "blah2", (-1.1, 22) with label "blah3", (-0.1, 25) with label "blah4" (so had to split the root quad), and then had the line:
    l MyAwesomeTree -4 1 0 18 0 5
    
    then this portion of the output would look like:
    Query intersects with the following rectangles in the quad tree:
    Rect_1:(-5.2, 0), (2.7, 16.35)
    Rect_2:(-5.2, 16.35), (2.7, 32.7)
    
    The rest of the output should look like:
    Quad tree records satisfying the query:
    (label_{limit_offset}, x_{limit_offset}, y_{limit_offset})
    ...
    (label_{limit_offset+limit_count}, x_{limit_offset+limit_count}, y_{limit_offset+limit_count})
    
    I.e., the string "Quad tree records satisfying the query:\n" followed by a sequence of lines listing up to limit_count many records in the quad tree file_name beginning limit_offsetth record found (starting at 0) that belongs to the rectangle given by the pair of points (p1_x, p1_y) and (p2_x p2_y). So for our example above, the rest of the output for this instruction would be:
    Quad tree records satisfying the query:
    (SpatiallyInteresting, -3.1, 7.8)
    

To implement the above instructions, the QuadTreeManager.java file should contain a class QuadTreeManager which has the following public methods which roughly correspond to each of these operations:

  1. QuadTreeManager(String databaseName) -- the constructor takes the name of the Sqlite database, the QuadTreeManager is supposed to manage the quad trees for. It then creates the following three tables in this database if they don't exist:
    1. QUAD_TREE(FILE_NAME VARCHAR(64), ROOT_RECT_ID INTEGER)
    2. QUAD_TREE_RECT(ID INTEGER AUTOINCREMENT, PARENT_ID INTEGER, X_LOW REAL, Y_LOW REAL, X_HIGH REAL, Y_HIGH REAL, PRIMARY_KEY(ID))
    3. QUAD_TREE_POINT(QUAD_RECT_ID INTEGER, X REAL, Y REAL, LABEL CHAR(16)))
    QUAD_TREE_RECT should probably also have an index on PARENT_ID
  2. boolean createQuadTree(String fileName, float lowX, float lowY, float highX, float highY). This method should insert a row into the table QUAD_TREE_RECT with values -1 for the PARENT_ID and lowX, lowY, highX, highY for the bounding coorindates of the quad tree. Don't specify the ID, but rely on AUTOINCREMENT. The meothod should get the ID of this added row after it is inserted. Then usinng this ID and filename it should insert a row into QUAD_TREE. The return value of this method should indicate if it succeeded to do the insert or not.
  3. boolean add(String fileName, QuadRecord record). This function should insert the passed QuadRecord into the quad tree given by fileName. To do this it should look up in QUAD_TREE to see if such a quad tree exists. If it does, it should get the QUAD_TREE_RECT row corresponding to ROOT_RECT_ID, and see if the record has a point that can be inserted into this quad tree. If it does it should do a query for QUAD_TREE_RECT rows with PARENT_ID having value ROOT_RECT_ID, and so on to trace down through the quad tree to where the record should be inserted. If a quad needs to be split (is a leaf with more than 4 records) the appropriate QUARD_TREE_RECT rows should be added to the table and existing QUAD_TREE_POINT rows in the now subdivided quad should have their QUAD_RECT_ID's updated. Finally, an appropriate new QUAD_TREE_POINT record should be added for record.
  4. QuadRectangle[] lookupRectangle(String fileName, QuadRectangle r) This function should return an array of QuadRectangle's (if they exists) from the quad tree fileName such that each rectangle non-trivially intersects with r such that the rectangle doesn't have child rectangles (QUAD_TREE_RECT's whose PARENT_ID is the ID of the rectangle).
  5. QuadRecord[] lookupPoint(String fileName, QuadPoint pt1, QuadPoint pt2, int limit_offset, int limit_count) This function should return an array of up to limit_count (if they exists) QuadRecord's from the quad tree fileName beginning with the limit_offsetth such record found (starting at 0) that lies in the rectangle given by the points pt1 and pt2.

This completes the description of the program you need to write.

Point Breakdown

Written problems 1 (1/2pt per sub-part).2pts
Written problems 2,3, 5 (1pt, 1/2pt/per sub-part).3pts
Written problems 4 (1/2pt table creation with index and inserts, 1/2pt queries).1pt
Program compiles and when run processes instruction file on sqlite database given by the command line arguments as described1pt
QuadTreeManager constructor works as described0.5pts
createQuadTree method works as described0.5pts
add method works as described1pt
Two lookup method work as described (1/2pt each)1pt
Total10pts