Chris Pollett > Students >
Bui

    ( Print View)

    [Bio]

    [Blog]

    [CS 297 Proposal]

    [Dynamic Hashing Schemes - PDF]

    [WARC Files - PDF]

    [Deliverable 1]

    [Deliverable 2]

    [Deliverable 3]

    [Deliverable 4]

    [CS 297 Report - PDF]

    [CS 298 Proposal]

    [WARC-KIT Code]

    [CS 298 Report - PDF]

    [CS 298 Presentation - PDF]

A persistent key-value store with Linear Hashing

David Bui (david.bui01@sjsu.edu)

Purpose: The purpose of this deliverable is to extend the simple key value value store in deliverable 1 to implement as dynamic hashing scheme known as Linear Hashing.

Linear Hashing Linear Hashing is a dynamic hashing technique that grows the number of initial buckets 1 bucket at a time according to some criteria. Hence it's name Linear Hashing.

  • A hash function's output will always give a fixed number of bits. Let's say our hash function gives a 32 bit hash output fro some key. In Linear Hashing however, we will only use the first I bits to address to N initial buckets.
  • If we start with N =2 bucket, then I = 1 bit. So, we will only use the first bit of the hash function's 32 bit output to map to a bucket.
  • Let our criteria for adding a bucket be passing a load factor thresshold i.e number of items / (number of buckets * average items in each bucket). Once the number of insertions exceeds our threshold we add a bucket to N. If N becomes another power of 2: N > (2^I -1) we increment I to address the new buckets.
  • When any bucket is added we split the bucket at an index S. S is initially the first bucket. When we split a bucket we rehash all the keys at bucket S add if the keys rehash to the address of the newly added bucket, we move the key there. Once N buckets has doubled from it's initial position we reset the S index to 0.

Client: The client webpage is extend from deliverable one to include buttons that perform automatic tests. An example is one of these test perform 1000 random KV pairs inserts then executes 10 random get requests to see if the pairs were successfully inserted.

Client 2 Webpage App.js

picture of client code

Server: The server instead of using the routes to interact with the Key value store instead uses the routes to call a implemented persistent asynchronous Linear Hash table data structure to perform the various put/get operations of a hash table. The Linear Hash table itself handles the writing and writing of any KV pairs to text files, which represent buckets, on disk. The Linear hash table also keeps track of the load factor and will create additional buckets when the threshold is crossed.

get function of LinearHashTable.js


    /**
     * Gets the value with the corresponding key
     * 
     * @param {string} key the key of the value we are searching for
     * @return {any} the value of the key
     */
    get =  (key) => {
        let hashInt = this.getHash(key);

        // search through each level to find if the hash
        for(let i = this.initialBits; i "<"= this.iBits; i++) {
            let bucketInd = this.getBucketInd(hashInt, i);
            let currentBucket = this.hashTable[bucketInd];

            if(typeof(currentBucket[key]) !== 'undefined') {
                return currentBucket[key];
            }
        }
        let value;
        return value;
    }


put function LinearHashTable.js


    /**
     * Adds the KV pair to the hash table
     * @param {string} key will be a property in a json
     * @param {any} value corresponding value of this key
     */
    put = (key, value) => {
        let hashInt = this.getHash(key)
        let bucketInd = this.getBucketInd(hashInt, this.iBits)
        let bucket = this.hashTable[bucketInd];
        bucket[key] = value
        this.hashTable[bucketInd] = bucket;

        this.numItems = this.numItems + 1
        if(this.computeLoadFactor() > this.loadFactor) {
            this.addBucket();
        }
    }

addBucket function of LinearHashTable.js


    /**
     * Adds an additional bucket to this Hash table and splits keys
     * as needed
     * Modifies number of bits needed to index if criteria is reached
     */
    addBucket = () => {
        let oldSplitInd = this.splitInd
        this.splitInd += 1

        let oldBucket = this.hashTable[oldSplitInd.toString()]
        let newBucket = {};
        
        this.N = this.N + 1;
        let newBucketInd = this.N - 1;


        // if power of 2 buckets has been reached 
        // increment iBits needed to be able to address N buckets
        if (this.IsPowerOfTwo(newBucketInd)) {
            this.iBits = this.iBits + 1;
        }

        // if number of buckets has been doubled reset split index to 0
        if(this.startingBuckets * 2 === this.newBucketInd) {
            this.startingBuckets = this.startingBuckets * 2;
            this.splitInd = 0;
        }

        // rehash entries in the old bucket split them with the new bucket
        let retainedEntries = {}
        for (let property in oldBucket) {
            let hashInt = this.getHash(property)
            let rehashedId = this.getBucketInd(hashInt, this.iBits);

            if(rehashedId != oldSplitInd) {
                newBucket[property] = oldBucket[property]
            }
            else {
                retainedEntries[property] = oldBucket[property]
            }
        }
        this.hashTable[oldSplitInd.toString()] = retainedEntries;
        this.hashTable[newBucketInd.toString()] = newBucket;

    }


example bucket text file: bucket 22.txt The initial line is the header containing keys located in the file each subsequent line is the key value pair in JSON format.


429,475,481,526,534,556,561,584,603,647,648,775,915,927,960
{"429":"cat429"}
{"475":"cat475"}
{"481":"cat481"}
{"526":"cat526"}
{"534":"cat534"}
{"556":"cat556"}
{"561":"cat561"}
{"584":"cat584"}
{"603":"cat603"}
{"647":"cat647"}
{"648":"cat648"}
{"775":"cat775"}
{"915":"cat915"}
{"927":"cat927"}
{"960":"cat960"}