Chris Pollett > Students > Shailesh

    (Print View)

    [Bio]

    [Project Blog]

    [CS297 Proposal]

    [CS297 Presentation-PDF]

    [CS297 Part of Speech Tagging Code]

    [CS297 Cosine Ranking Code]

    [CS297 Wordnet Code]

    [CS297 Report-PDF]

    [CS298 Proposal]

    [CS298 Presentation-PDF]

    [CS298 Project Report-PDF]

                          

























CS298-Incorporating WordNet in an Information Retrieval System

04/22/2014 Project meeting

  • Created First patch and found some issue with mentis while uploading path file.
  • Take a look at generated patch and will make some changes in variable name and removing blank spaces.
  • Decided to set WordNet feature off as user has to define wordnet directiry to use it while searching.
  • Decided to add WordNet option in Page Options because all the search results features are present on that page.

04/15/2014 Project meeting

  • The results of data set in which Part of Speech tagging used during crawl time was examined after crawling 100,000 pages.
  • After examination, compared the area under the curve for recall vs precision graph between Crawl with Part of Speech tagging and Crawl without Part of Speech tagging.
  • Comparison showed that , area under the curve in without Part of Speech tagging during crawl time dataset was more.
  • Decided to continue further examination on without Part of Speech tagging during crawl time dataset.
  • After examining two data sets, we got very good results.Reordering of results using BM25 ranking seems good without affecting the throughput time.
  • Decided to examine with one more data set - dmoz data set which is open directory web-page description(RDF format).
  • Discussed to implement the WordNet feature on/off in Yioop!.
  • Decided to feature to edit installed WordNet directory path.

04/08/2014 Project meeting

  • Decided to move to experiment phase
  • Initially decided to work with data set as cs.sjsu.edu webpages and wikipedia dataset.
  • Discussed about how to display results of WordNet on Yioop! page
  • Decided to compare the results in 3 scenarios
    • 1. Crawl with Part of Speech tagging
    • 2. Crawl without Part of Speech tagging

04/01/2014 Project meeting

  • Integrated BM25 (WordNet) score ranking in Search result rank measure
  • Had discussion about how to add the BM25 (WordNet) score with final ranking mechanism
  • After sorting search results with BM25 (WOrdNet) score, search results are more meaningful.
  • Finally we decided to do some experiments with added improvements in yioop with query expansion and try to find out some useful finding from p@10, recall and BM25 score.
  • Decided to show top two similar words on search results page

03/25/2014 Project meeting

  • No Meeting Due to Spring Break

03/18/2014 Project meeting

  • Find out the two most relevant synonyms for a given query. WordNet gave us the correct output as per the requirement.
  • Had some discussion about performance and tried to make it little bit faster.(Current Processing time : 1 sec)
  • Discussed how can we use these two synonyms efficiently in Yioop for query expansion.
  • Discussed an idea of BM25 for relevance ranking and will experiment with that idea on first 10 pages and see if it works better.
  • Discussed another probability of directly replacing the similar words in query and get the search results and see if that works better than BM25 idea.

03/11/2014 Project meeting

  • Did code walk through for integrated code
  • Try to find out the values for index_name and lang and fixed the issue
  • Had discussion on method used from IndexManager class and found its working correctly. The corpus is limited so getting 0 results from used method.
  • Decided to use WordNet database to create large corpus of data
  • Once we get non zero values from numDocTerms method we will store them in array and top 2 queries will be selected to permutation process.

03/04/2014 Project meeting

  • Did code walk through of code and tried to make it more effecient.
  • Decided to replace query with similar words which we obtain from wordnet
  • Get the Document count for each query
  • Select the top two queries
  • Decided to perform query expansion on those top two queries.

02/25/2014 Project meeting

  • Evaluated time efficiency of Part of Speech Tagging code and it was quite impressive
  • Decided not to change Part of Speech tagging code
  • Discuss new features added in Yioop in recent build
  • Had some fruitful discussion on new MVC architecture used in Yioop
  • Decided to implement WordNet code in Yioop which will help us to expand the query

02/18/2014 Project meeting

  • Integrated Part of Speech tagging code in Yioop
  • Cross verified the output of Part of Speech Tagging code(Implemented only for Crawler part)
  • Discuss different ways to integrate the code on input query given by user.
  • Had walk through on Part of speech tagging
  • Decided to see time efficiency of code in next meeting

02/11/2014 Project meeting

  • Finalized Project Proposal
  • Made some necessary changes in proposal and uploaded on web site
  • Try to figure out the future path for CS298 deliverables

11/19/2013 Project meeting

  • Learn the file structure of yioop project
  • Try to find out the cause of failure of crawling on local system
  • Find out the name of the file in which we have to make changes
  • Find out the way to see the changes made in Yioop files (Page Options)

11/12/2013 Project meeting

  • Showed execution of modified version of Wordnet code using intersection function
  • Get the output from the developed code and compared it with hand written output
  • Download the yioop from SeekQuery.org and installed it on local system

10/29/2013 Project meeting

  • Showed the execution of Wordnet code using intersection ranking algorithm
  • Compared the output from the executed code and compared it with hand written output

10/22/2013 Project meeting

  • Showed the execution of Wordnet code using cosine ranking algorithm
  • Compared the output from the executed code and compared it with hand written output

10/15/2013 Project meeting

  • Execute the modified code of parser function.
  • Try to compress the variety Speech tagging to Noun, verb, adverb and adjective

10/08/2013 Project meeting

  • Troubleshoot some issue found in parser function of wordnet and try to get proper output as per the goal and requirement
  • Make some minor changes in regular expression used to parse the wordnet output file.
  • Make sure the further path for coding is neat and clear

10/01/2013 Project meeting

  • Discussed the parsing techniques for wordnet output and get noun, verbs and adjectives as per the output of Part of speech tagging
  • Try to figure out how regular expression will work in parsing the output

09/24/2013 Project meeting

  • Discussed the Part of speech tagging code implementation in our feature and figured out how it will be efficient for our code implementation
  • Discussed how the output of Part of speech tagging code will co operate with Wordnet output

09/17/2013 Project meeting

  • Find out different methods to find the similarity of the sentence
  • Discussed the complexity and running time required for execution
  • Finally decided to go with cosine similarity ranking algorithm

09/10/2013 Project meeting

  • Presentation on wroking of Wordnet through command prompt and try to figure out usage of the results from WordNet in our project
  • Discussed the idea of query expansion using WordNet results and try to come up with some workaround and algorithm

09/03/2013 Project meeting

  • Worked on project proposal
  • Finalized Project Proposal by discussing different deliverable & summary of project and Uploaded on the website
  • Install WordNet on Windows system
  • try to examine working of WordNet using his database files to obtain the results