Chris Pollett >
Students >
Snigdha [Bio] [Del 2: Naive Bayes Classifier] |
CS297 ProposalAdding a source code searching capability to Yioop!Snigdha Parvatneni (snigdha.parvatneni@gmail.com) Advisor: Dr. Chris Pollett Description: The objective of this project is to develop and incorporate a source code searching feature for Java and Python in Yioop. This feature will enable users to paste code snippet in search bar and search source code directly. One of the approaches to do this is to use character n-grams. Ignoring white space in character n-grams for some languages like Java works well but there are certain languages where white space matters. Our approach will be modified for languages sensitive to white space like Python. In addition to this experiments will be done to find values of n for different languages which will give the best search results. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Study how the existing code search engines - Sourcerer code search engine and Google code search engine, perform the code search. 2. Write the classifier program to recognize that a given code snippet is from Java or Python. 3. Write a PHP program for Naive Bayes classifier and set the languages for Java and Python Source code files to chunk the contents of the file into trigrams. 4. Write a PHP program to reproduce the functionality of Git clone using cURL requests. 5. Project write-up for CS297. References: [1] Information Retrieval: Implementing and Evaluating Search Engines. Stefan Battcher, Charles L. A. Clarke and Gordon V. Cormack. The MIT Press. 2010. |