Chris Pollett > Students > Dhole
[Bio] [Blog] [Deliverable #1: Naive Bayes Classifier] [Hierarchical Agglomerative Clustering - PDF] [Deliverable #2: Hierarchical Agglomerative Clustering] [Deliverable #3: Classifiers and Clustering in Yioop] [Deliverable #4: Recipe plugin scale out] |
CS298 ProposalAdaptive Clustering in Search EnginesKuldeep Dhole (kuldeep.dhole@sjsu.edu) Advisor: Dr. Chris Pollett Committee Members: Dr. Sami Khuri, Dr. Robert Chun Abstract:A search engine application provides categories of search result pages according to users' interest. Search result pages can be grouped according to the categories predicted by a query classification algorithm. In Yioop, which is an open source search engine software developed by Dr. Pollett, current results are delivered on a GUI without any organized clusters. We will be using an hierarchical clustering algorithm to deliver the search results organized. A web crawler constantly crawls the web pages, and creates indexes, simultaneously, a clustering algorithm will be constantly absorbing newly indexed data, and keep modifying the whole cluster. But, as crawling and indexing work in a distributed environment, we need to have a clustering algorithm, which is incremental and can scale out. We are aiming to add adaptive clustering technique to the crawler using unsupervised learning so that it can scale out clustering, and deliver the search results according to most relevant clusters. CS297 Results
Proposed Schedule
Key Deliverables:
Innovations and Challenges
References:HIERARCHICAL CLUSTERING. Costa Siourbas. http://cgm.cs.mcgill.ca/~soss/cs644/projects/siourbas/, 1999. Information Retrieval: Implementing and Evaluating Search Engines. Buettcher, Clarke and Cormack. The MIT Press. 2010. Artificial Intelligence: A Modern Approach. Stuart Russell and Peter Norvig. Prentice Hall, 1st edition . 1995. |