Chris Pollett> CS157b
( Print View )

Student Corner:
[Lecture Notes]
[Discussion Board]

Course Info:
[Texts & Links]
[Description]
[Course Outcomes]
[Outcomes Matrix]
[Course Schedule]
[Grading]
[Requirements/HW/Quizzes]
[Class Protocols]
[Exam Info]
[Regrades]
[University Policies]
[Announcements]

HW Assignments:
[Hw1] [Hw2] [Hw3]
[Hw4] [Hw5] [Quizzes]

Practice Exams:
[Midterm] [Final]

HW#2 --- last modified March 06 2023 23:47:49.

Solution set.

Due date: Mar 10

Files to be submitted:
  Hw2.zip

Purpose: To become more familiar with database record structures, indexes, and B-trees and to experiment with these in a modern DMBS

Related Course Outcomes:

The main course outcomes covered by this assignment are:

CLO1 -- Know common database record formats

CLO2 -- Given an index structure based on a B-tree or extensible hashing be able to figure out the effect of performing an insert or a delete

Specification:

This homework will consist of two parts, a written part and a coding part. Both parts will be submitted in the Hw2.zip file. The written part should be in a file Hw2.pdf. For the written part of the homework you should write solutions for the following questions. In what you turn in, make sure to write the names and student ids for each group member. For each problem, first copy and paste the question, then write your solution to it beneath it.

  1. Briefly explain for each of the following which is more efficient, completely stating your assumptions. (Give a concrete example why by drawing a disk state and a memory state and sequence for record accesses that illustrate your reasoning) (a) swizzling on demand versus automatic swizzling if you are both logging ATM transactions (keeping a ledger of them) and updating bank account balances, (b) swizzling on demand versus automatic swizzling for looking up a popular product and its reviews, (c) no swizzling versus automatic swizzling if you are doing a table scan, (d) swizzling on demand versus no swizzling for frequent B-tree looks ups of the same B-tree.
  2. Suppose a record has the following fields in this order: An 4 byte integer, character string of length 21, two SQL dates, and a 8 bit Binary field. How many bytes does the record take if:
    1. Fields can start at any byte.
    2. Fields must start at a byte that is a multiple of 4.
  3. Suppose we are storing 128 byte records into a data file in our database. Each record has a 8 byte integer key. The data file is sorted by this key. Database blocks are 8192 bytes long. Block pointers are 24 bytes. A record pointer consists of a block pointer followed by two byte number indicating which record offset in the block header is for the record in question. Each block begins with two bytes indicating the number of records in a block, followed by a two byte offset pointing to the first element in the available space list, followed by a list of offsets for each record in the block. Suppose we want to store 75 million records. Calculate showing all work the number of blocks used by the data file and by an index on this data file if:
    1. the index is dense,
    2. the index is sparse.
  4. Create a text transcript of doing the following activities on a Mysql database from the Mysql shell: Create a Mysql table suitable for handling online shopping product reviews. Add a full text index on the columns you use for the text contents of your post. Insert a few blog posts. Determine what the current stop words are for your index. Do a search of your table on a word or phrase that is not a stop word, but is in one of the inserted posts. Change the stop list to now include your non-stop word and redo the searches. Please consult Mysql Full Text Stopwords page, for more info about how to do this problem.
  5. For the following B-tree, show:
    1. step-by-step the blocks accessed in looking up 15 (a key not in the tree)
    2. step-by-step inserting a new entry for a record with key 38

    B-Tree for Homework

For the coding part of the assignment, I'd like you to programmatically implement 2d-trees (k=2 in kd-tree) in Sqlite using a Java program named TwoDimTreeManager.java. To keep things simple, we will assume our 2d-trees are used to store triple (x,y, label) data. To test your code, the grader will compile it from the command line with a line like:

javac TwoDimTreeManager.java

Your code should include other classes given by the files TreePart.java, TwoDimNode.java and TwoDimRecord.java. Here is all the code you need for these classes:

//TreePart.java
public class TreePart
{
    public boolean isLeaf;
    public int id;
    public int parent_id;
    public float value1;
    public float value2;
}
//TwoDimNode.java
public class TwoDimNode extends TreePart
{

    public boolean isX;
    public int childId0;
    public int childId1;
    public int childId2;
    
    public TwoDimNode(int id,  boolean isX, int child0, float value1, 
        int child1, float value2, int child2)
    {
        this.id = id;
        isLeaf = false;
        this.isX = isX;
        this.childId0 = childId0;
        this.value1 = value1;
        this.childId1 = childId1;
        this.value2 = value2;
        this.childId2 = childId2;
    }
    public String toString()
    {
        String out = "Internal Node Id:" + id + " Parent Id:" + parent_id;
        out += (isX) ? " query variable is X; " : " query variable is Y; ";
        out += "key values are: " + value1 + ", " + value2 + "; ";
        out += "Child IDs are: " + childId0 + ", " + childId1 + ", " + childId2;
        return out;
    }
}
//TwoDimRecord.java
public class TwoDimRecord extends TreePart
{
    public String label;
    public TwoDimRecord(int id, int parent_id, String label, float x, float y)
    {
        this.id = id;
        this.parent_id = parent_id;
        this.label = label;
        value1 = x;
        value2 = y;
        isLeaf = true;
    }
    public String toString()
    {
        return "Record: " + id + " Parent Id:" + parent_id + Label:" + 
            label + " (" + value1 + ", " + value + ")";
    }
}

Your program TwoDimTreeManager will be run from the command line with a line like:

java TwoDimTreeManager sqlite_database_name name_of_instruction_file

For example,

java TwoDimTreeManager test.sqlite instructions.txt

The above command should create a connection to the Sqlite database test.sqlite and then execute each of the instructions in instructions.txt on that database. An instruction file consists of a sequence of lines, each line being an instruction. These can be one of the following types:

  1. A create 2d tree line which creates a new 2d-tree. It has the format:
    c tree_name
    
    This creates a new 2d tree named tree_name as well as its root TwoDimNode node. We will say how this is represented in Sqlite tables in a minute. The root node by default will have isX false, until at least the second element has been inserted. At this point, no elements are inserted, so both value1 and value2 in this node should get values Float.NaN. The following is an example of a create instruction:
    c MyAwesomeTree
    
  2. An insert 2d-tree point which inserts a record into a 2d-tree. It has the format:
    i tree_name label x_value y_value
    
    This instruction should insert into the 2d tree file_name (only if it exists) the record given by label x_value y_value. The following is an example of an insert instruction:
    i MyAwesomeTree SpatiallyInteresting -3.1 7.8
    
    This should insert into the 2d tree MyAwesomeTree a record with label SpatiallyInteresting at location (-3.1, 7.8).
  3. A lookup 2d-tree points which looks up the records in the record associated with a point (if it exists). It has the format:
    l tree_name p_x p_y
    
    This should to standard output (usually print to the terminal) the corresponding record using its toString() method. I.e., it should output something like
    Record: id Label: label (x, y)
    
    For example:
    l MyAwesomeTree -3.1 7.8
    
    on the tree built so far, would output
    Record:2 Parent Id:1 Label:SpatiallyInteresting (-3.1, 7.8)
    
    If no record exists for the given point, the output should be: were (x,y) is filled in with the value that was attempted to be looked up.
  4. A pretty print tree line this is used to used to print out all the TreeParts of a 2d-tree. It has the format:
    p tree_name
    
    For example,
    p MyAwesomeTree 
    
    which should print out something like:
    MyAwesomeTree
    Internal Node id: 1 Parent Id:-1 query variable is Y; key values are: 7.8, NaN; Child IDs are: 2, -1, -1;
    Record:2 Parent Id:1 Label:SpatiallyInteresting (-3.1, 7.8)
    

To implement the above instructions, the TwoDimTreeManager.java file should contain a class TwoDimTreeManager which has the following public methods which roughly correspond to each of these operations:

  1. TwoDimTreeManager(String databaseName) -- the constructor takes the name of the Sqlite database, theTwoDimTreeManager is supposed to manage the 2d trees for. It then creates the following three tables in this database if they don't exist:
    1. TWO_DIM_TREE(FILE_NAME VARCHAR(64), ROOT_ID INTEGER)
    2. TWO_DIM_TREE_PART(NODE_ID INTEGER AUTOINCREMENT, PARENT_ID INTEGER, IS_LEAF BOOLEAN, VALUE1 REAL, VALUE2 REAL, PRIMARY_KEY(ID))
    3. TWO_DIM_AUX_NODE_DATA(NODE_ID INTEGER, IS_X BOOLEAN, CHILD0 INTEGER, CHILD1 INTEGER, CHILD2 INTEGER, PRIMARY_KEY(NODE_ID))
    4. TWO_DIM_AUX_RECORD_DATA(NODE_ID INTEGER, LABEL CHAR(16), PRIMARY_KEY(NODE_ID))
    TWO_DIM_TREE_PART should probably also have an index on PARENT_ID.
  2. boolean createTwoDimTree(String fileName). This method should insert a row into the table TWO_DIM_TREE_PART with values -1 for the PARENT_ID, IS_LEAF false, VALUE1 and VALUE2 both NULL (for a real column Sqlite treats null like a NaN). Don't specify the ID, but rely on AUTOINCREMENT. The method should get the ID of this added row after it is inserted. Then using this ID and filename it should insert a row into TWO_DIM_TREE. The return value of this method should indicate if it succeeded to do the insert or not.
  3. boolean add(String fileName, TwoDimRecord record). This function should insert the passed TwoDimRecord into the 2D-tree given by fileName. To do this it should look up in TWO_DIM_TREE to see if such a 2d tree exists. If it does, it should get the TWO_DIM_TREE_PART row corresponding to ROOT_ID, and use it to traverse 2d tree and insert the record. If an internal node needs to be split (is a leaf with more than 4 records) the splitting should be done according to the algorithm from class.
  4. TwoDimRecord lookup(String fileName, float x, float y) This function should return a TwoDimRecord (if exists, or null if not) from the 2D tree fileName with the given x,y value.
  5. printTree(String fileName) This function should do a tree traversal of the 2d-tree given by fileName, and pretty print each of its nodes and records to System.out.

This completes the description of the program you need to write.

Point Breakdown

Written problems 1 (1/2pt per sub-part).2pts
Written problems 2,3, 5 (1pt, 1/2pt/per sub-part).3pts
Written problems 4 (1/2pt table creation with index and inserts, 1/2pt queries).1pt
Program compiles and when run processes instruction file on a sqlite database given by the command line arguments as described1pt
TwoDimTreeManager constructor works as described0.5pts
createTwoDimTree method works as described0.5pts
add method works as described1pt
lookup and printTree methods work as described (1/2pt each)1pt
Total10pts