Chris Pollett > Students > Dhole
[Bio] [Blog] [Deliverable #1: Naive Bayes Classifier] [Hierarchical Agglomerative Clustering - PDF] [Deliverable #2: Hierarchical Agglomerative Clustering] [Deliverable #3: Classifiers and Clustering in Yioop] [Deliverable #4: Recipe plugin scale out] |
CS297 ProposalAdaptive Clustering in Search EnginesKuldeep Dhole (dkuldeep11@gmail.com) Advisor: Dr. Chris Pollett Description:A search engine application provides categories of search result pages according to users' interest. Search result pages can be grouped according to the categories predicted by a query classification algorithm. We will be using unsupervised learning, where given data does not have any label associated with it. A classifier is used to attach labels to pages/documents. In unsupervised learning, clustering technique comes up with label for document. A crawler uses labels to determine what to crawl. A web crawler is used to update web contents or indexes of other sites' web contents. We are using Yioop, which is an open source search engine software developed by Dr. Pollett. Yioop comes with a crawler which does not have control over classifier and clustering. We are aiming to add adaptive clustering technique to the crawler using unsupervised learning so that it can scale out clustering. Schedule:
Deliverables: The full project will be done when CS298 is completed. The following will be done by the end of CS297: 1. Build Naive Bayes classifier for a given set of documents. 2. Build hierarchical agglomerative clustering for cluster analysis. 3. Study of existing classifier and clustering code in Yioop. 4. Get the recipe plug-in to scale out clusters by ingredients. 5. CS297 Report References: 1. HIERARCHICAL CLUSTERING ALGORITHMS : http://cgm.cs.mcgill.ca/~soss/cs644/projects/siourbas/sect5.html 2. Information retrieval - Implementing and Evaluating Search Engines by Buettcher, Clarke and Cormack. 3. Yioop Documentation : http://www.seekquarry.com/?c=main&p=documentation 4. Artificial Intelligence- A Modern Approach by Stuart Russell and Peter Norvig : http://aima.cs.berkeley.edu/ |