Chris Pollett > Students > Snigdha

    Print View

    [Bio]

    [CS297 Proposal]

    [Project Blog]

    [Del 1: Reading Review - PDF]

    [Del 2: Naive Bayes Classifier]

    [Del 3: Language Setting]

    [Del 4: Git Clone using cURL]

    [Del 5: CS297 Report - PDF]

    [CS298 Proposal]

    [CS298 Presentation - PDF]

    [CS298 Report - PDF]

                          

























CS297 Proposal

Adding a source code searching capability to Yioop!

Snigdha Parvatneni (snigdha.parvatneni@gmail.com)

Advisor: Dr. Chris Pollett

Description:

The objective of this project is to develop and incorporate a source code searching feature for Java and Python in Yioop. This feature will enable users to paste code snippet in search bar and search source code directly. One of the approaches to do this is to use character n-grams. Ignoring white space in character n-grams for some languages like Java works well but there are certain languages where white space matters. Our approach will be modified for languages sensitive to white space like Python. In addition to this experiments will be done to find values of n for different languages which will give the best search results.

Schedule:

Week 1: Jan 29-Feb5Discuss in depth various aspects of project with advisor.
Week 2: Feb6-Feb12Install and understand the working of Yioop.
Week 3,4: Feb13-Feb26Deliverable1: Study how Sourcerer and Google code search engine perform the code search.
Week 5,6,7: Feb27-Mar19Deliverable2: Writing the classifier program to recognize that a given code snippet is from Java or Python.
Week 8,9,10: Mar20-Apr09Deliverable3: Writing a PHP program for Naive Bayes classifier and setting the languages for Java and Python Source code files to chunk the contents of the file into trigrams.
Week 11,12,13: Apr10-Apr30Deliverable4: Writing a PHP program to reproduce the functionality of Git clone using cURL requests.
Week 14,15: May01-May14Deliverable5: Writing CS297 Report.

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Study how the existing code search engines - Sourcerer code search engine and Google code search engine, perform the code search.

2. Write the classifier program to recognize that a given code snippet is from Java or Python.

3. Write a PHP program for Naive Bayes classifier and set the languages for Java and Python Source code files to chunk the contents of the file into trigrams.

4. Write a PHP program to reproduce the functionality of Git clone using cURL requests.

5. Project write-up for CS297.

References:

[1] Information Retrieval: Implementing and Evaluating Search Engines. Stefan Battcher, Charles L. A. Clarke and Gordon V. Cormack. The MIT Press. 2010.
[2] Yunwen Ye, Programming with an intelligent agent,” IEEE Intelligent Systems Vol.18 (3) pp. 43-47 (May 2003).
[3] Santanu Paul and Atul Prakash, “A framework for source code search using program patterns,” IEEE Trans. On Software Engineering Vol. 20(6) pp. 463-475 (June 1994).