Chris Pollett > Students >
Sandhya

    ( Print View )

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [Autosuggest-PDF]

    [English dictionary Trie]

    [Google Autosuggest]

    [Autosuggestion in Yioop]

    [Multi-word Autosuggest]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [Autosuggest for foreign languages]

    [CS298 Project Report-PDF]

    [CS298 Presentation-PDF]

    [Project Code-ZIP]

                          

























Deliverable-1

Following are submitted

  • A php program trie_eng.php to build a trie of English dictionary words in a gzip format
  • The file of English dictionary words used by the above program

Details

The aim was to construct a data structure of English dictionary words that can be used to auto complete the words while a user starts typing in Yioop. The suitable one is a Trie [1]
The php program

  • Creates a Trie in which words are stored using multi-level php arrays
  • The Trie is then JSON encoded and gzip version will be the output
  • Eliminates any words with less than 3 letters or stop words or any words which has non-ASCII characters
  • The final gzip file is around 250KB, which is a reasonable size to send over network and load while using Yioop [2]

This will be loaded whenever a user accesses Yioop website and further processed using Javascript on the client machine.

To run the program, do the following using command line
php trie-eng.php dic_file_name

Code and dictionary file

Timing tests -

Experiments were conducted to calculate the page load time when trying to load the Trie with the website. This is monitored using Firefox Web Console option.
Loading the JSON Trie of size 2.5MB takes around 2.5 seconds

Firefox Web console screen shot
The gzip option of HTTP was enabled in Apache webserver, by adding the following statements in httpd.conf of Apache.
IfModule deflate_module
SetOutputFilter DEFLATE
IfModule


It was seen that HTTP gzips the 2.5 MB JSON encoded Trie and loads in around 400ms, which is far less than loading a Trie directly.

Firefox Web console screen shot
Already zipped file which is about 250KB, would load in 35ms as shown below.

Firefox Web console screen shot
The third option is to compress the Trie with a gz extension and modify the Accept Encoding in http to gzip,deflate. By providing this option, the browser expects a compressed file and uncompresses it on the fly. For this I activated the following options in httpd.conf

# AddEncoding allows you to have certain browsers uncompress # information on the fly. Note: Not all browsers support this.
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz


This takes just 3 ms to load the Trie and the browser automatically decompresses and makes the JSON Trie available for autosuggest

Firefox Web console screen shot

After conducting these experiments, it is concluded that

  • A compressed Trie with gz extension will be made available on Yioop server
  • Httpd.conf will be modified to accept gzip compressed files
  • The browser will unzip the data and it will be used for autosuggest

References
[1] http://en.wikipedia.org/wiki/Trie
[2] http://developer.yahoo.com/performance/rules.html