Chris Pollett > Students > Snigdha

    Print View


    [CS297 Proposal]

    [Project Blog]

    [Del 1: Reading Review - PDF]

    [Del 2: Naive Bayes Classifier]

    [Del 3: Language Setting]

    [Del 4: Git Clone using cURL]

    [Del 5: CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Presentation - PDF]

    [CS298 Report - PDF]


Deliverable 3 - Naive Bayes Classifier in PHP and Language Setting in Yioop

The aim of the deliverable was to learn PHP and to set the language of crawled Java and Python source codes. This deliverable has two parts. The first part of the deliverable is to write a PHP program for Naive Bayes Classifier to recognize Java and Python languages. The second part of the deliverable includes modifying Text Processor in Yioop to set language as Java or Python based upon the extension of the crawled source code file.

This first part of the deliverable helped me in learning basic concepts in PHP. Results obtained by Naive Bayes classifier implemented in Java for deliverable 2 was in-line with the output of PHP coded Naive Bayes classifier.

In order to achieve the goals of the second part of the deliverable 3, the locale tags were created for Java and Python and chargram value is set to 3 in tokenizer.php for both the locale tags. It chunk the contents of crawled web pages in trigrams if language is Java or Python. To achieve this text processor in Yioop was modified. If extension of crawled file is .java then language is set to Java if the extension is .py then language is set to Python. Config.php was also modified to transfer the control to the text processor in Yioop when Java or Python source code files are crawled. Mime type for .java file is text/x-java-source in Apache and mime type for .py file is text/plain in Apache. In lighttpd server mime type for Java source code file is text/plain and mime type for Python source code file is text/x-python.

[Naive Bayes Classifier Program in PHP - Zip]