CS298 Proposal
Processing Posting Lists Using OpenCL
Radha Kotipalli (sowmyu94@yahoo.com)
Advisor: Dr. Chris Pollett
Committee Members: Dr . Sami Khuri, Dr. Thomas Austin
Abstract:
The objective of this project is to create a GPU - based, parallel reader of posting lists for Yioop.
Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. We
will be using OpenCL to code for the GPU. We will then do experiments comparing performance of our GPU
posting list handler with the existing mechanism in Yioop.
Search engines use an inverted index to look up for a term in all the documents containing that
term. This list of documents is called a posting list. Posting lists are typically stored in a
compressed binary format.
OpenCL (Open Computing Language) provides a platform for parallel and efficient programming. GPUs
have hundreds or thousands of processing units rather than one to eight for a typical CPU.
OpenCL provides an effective way to program GPUs and to send and receive data from the CPU
in a system and a GPU. Many posting list operations can be done in parallel.
So implementing posting lists algorithm using OpenCL will likely improve the performance of search engine.
CS297 Results
- Understood different algorithms for index compressions and wrote a C-Program to implement Huffman encoder/decoder.
- Wrote a C-Program to encode/decode posting lists using Simple-9 compression algorithm.
- Installed the necessary C++ bindings for Microsoft Visual studio and wrote a OpenCL program to compute the squares of each element of an array.
- Installed PHP extensions for windows and wrote a simple PHP extensions for the hello world program.
Proposed Schedule
Week 1:
(Aug 25 - Aug 30) | CS 298 Proposal |
Week 2:
(Aug 31 - Sep 6) | Get the latest version of Yioop and understand the existing code |
Week 3:
(Sep 7 - Sep 13) | Deliverable 1:C-program to encode/decode posting lists using Modified-9 compression algorithm (existing Yioop's algorithm) |
Week 4:
(Sep 14 - Sep 20) | Finish deliverable 1 |
Week 5:
(Sep 21 - Sep 27) | Deliverable 2: Write an OpenCL program to above C-program |
Week 6:
(Sep 28 - Oct 4) | Continue working on deliverable 2 |
Week 7:
(Oct 5 - Oct 11) | Finish deliverable 2 |
Week 8:
(Oct 12 - Oct 18) | Deliverable 3: Write PHP Extensions to the above OpenCL program |
Week 9,10:
(Oct 19 - Nov 1) | Continue working on deliverable 3 |
Week 11:
(Nov 2 - Nov 8) | Finish deliverable 3 |
Week 12,13:
(Nov 9 - Nov 22) | Deliverable 4: Conduct performance comparison tests with the GPU posting handler and the exisiting Yioop system |
Week 14:
(Nov 23 - Nov 29) | Start working on CS298 report |
Week 15:
(Nov 30 - Dec 6) | Finish CS298 Report and submit to Advisor and committee members |
Week 16:
(Dec 7 - Dec 13) | Defense |
Key Deliverables:
- Software
- Write a C-program to encode/decode posting lists using Modified-9 compression algorithm (existing Yioop's algorithm)
- Convert above C-program into an OpenCL program
- Write PHP Extensions to the above OpenCL program
- Conduct performance comparison tests with the GPU posting handler and the exisiting Yioop system
- Report
- CS298 Report
- Project Code and Test results Documentation
Innovations and Challenges
- Working with novel index compression schemes
- Writing PHP extensions to the OpenCL
- Conducting performance testing to compare relative performance improvements may need dedicated hardware
References:
1. Stefan, B., Clarke , C., Cormack, G. (2010). Information retrieval - Implementing and Evaluating Search Engines . Cambridge, Massachusetts: MIT Press.
2. Benedict , G., Howes, L., Kaeli, D., Mistry, P., Schaa, D. (2011). Heterogeneous Computing with OpenCL. Morgan Kaufmann.
3. Golemon, Sara. Extending and Embedding PHP. Indianapolis, Ind.: Sams, 2006. Print. |