Chris Pollett > Students > Radha

    Print View

    [Bio]

    [Blog]

    [CS297 Proposal]

    [Deliverable1_indexCompression]

    [Deliverable2]

    [Deliverable3]

    [Deliverable4]

    [CS297_report]

    [CS298 Proposal]

    [CS298_Progress_Report]

    [CS298_presentation]

    [CS298_Report]

    [CS298_Project_SourceCode-ZIP]

                          

























Compressing Posting List using Simple-9 algorithm:

Description: This algorithm works by checking the Δ-values in a posting sequence and tries to squeeze as many of them as possible into a 32 bit machine word. In this 32 bit, 4 bits are reserved for a selector, which tells how many Δ-values of equal size have been inserted in the remaining 28 bits. There are nine different ways of dividing then into chunks of equal size.
Example:
Input Word: "The" 1624 1650 1876 1972 2356
Δ-values : 1624 25 225 95 383 [Δ-value: [1650 -1624 -1] = 25, ...]
The above indexes can be saved as 1624 and 25 together as two 14-bits each; 225, 95, and 383 together as three 9-bits each, and one unused bit at the end.

I have implemented Simple-9 algorithm in C. I took the inputted word and it's index positions and used linked list to save the word and its corresponding Δ-values.
Encoding: Since we can insert a maximum of 28 different Δ-values (if they are 1 bit each) at a time, I considered taking a max of 28 values at a time using a pointer to the linked list and tried to find its appropriate selector value and converts them into a 32 bit chunk and then moving the pointer to the corresponding value and repeated the process till the end of the list. Again I have used linked lists to save these 32 bit chunks.
Decoding: I have taken the encoded linked list of 32 bit chunks, found the selector by reading the first 4 bits and used that selector value to split the remaining 28 bits into equal sizes and converted it back to the original index positions.
The source code can be viewed
here

Output Image

Output Image