CS298 Proposal

Extending Yioop with Geographical Location Local Search

Vijaya Sinha(vijya.sinha@gmail.com)

Advisor: Dr. Chris Pollett

Committee Members: Prof. Soon Tee Teoh, Hai Su


It is often useful when doing an internet search to get results based on our current location. For example, we might want such results when we search on Eating Joints, Car Service Center, or Hospitals. Current open source search engines like those based on Nutch do not provide this facility. Commercial engines like Google and Yahoo provide this facility so it would be useful to incorporate it in a open source alternative. Complete raw data dumps for determining geo location of IP addresses can be found from hostip.info. Similarly, openstreetmap.org allows complete Earth downloads of street map data. The goal of my project is to use these public data sources and extend Yioop to return location-based search results. The hostip.info data set can be used to geolocate the ipaddress from a given search query. These geolocations will then be used to parse out the relevant information based on the given geographical location, and return the relevant local search results. We will be extending the archive iterator from Yioop so it can be used to crawl the planet.osm archives. Once we have the crawled data, an algorithm that ranks search results on the basis of distance will be used to rank results nearest to the geolocation of search query. The last part of the project includes plotting this ranked data on a map by rendering the abstract data onto tiles to produce a concrete map.

CS297 Results

  • Performed a simple web crawl using Yioop to understand its inner workings
  • Created a simple app to return geolocation from a given ipaddress. This deliverable helped me understand the database schema of the hostip database.
  • Created an app to extract records based on a keyword from Xml formatted planet.osm file.
  • CS297 Report.

Proposed Schedule

Week 1: August 23-August 30CS298 Proposal
Week 2: August 30-September 6Experiment with the various rendering softwares available on OpenStreetMap wiki.
Week 3: September 6-September 13Deliverable 1 Due: Plot the ipaddress from the search query onto the map using a rendering software.
Week 4: September 13-September 20Work on archive iterators for planet.osm files.
Week 5: September 20-September 27Deliverable 2 Due: Use archive iterator and get the location from planet.osm based on the geolocation from search query.
Week 6,Week 7: September 27-October 11From the search results of a given location figure out results nearest to user.
Week 8: October 11-October 18Devise and work on an algorithm to rank search results according to distance from the geolocation of user.
Week 9: October 18-October 25Deliverable 3 Due: Rank search results using the algorithm devised.
Week 10: Octber 25-November 1Work on plotting the results obtained, on the map.
Week 11,12,13: November 1-November 22Deliverable 4 Due: Plot the final results on map.
Week 14: November 22-November 29Start working on report.
Week 15: November 29-December 6Finalize report.
Week 16: December 6-December 13Defense

Key Deliverables:

  • Software
    • Deliverable 1 due:Plot the ipaddress from the search query onto the map using a rendering software.
    • Deliverable 2 due:Use archive iterator and get the location from planet.osm based on the geolocation from search query.
    • Deliverable 3 due:Devise an algorithm to determine search results nearest to user based on the distance of how near it is from the user performing a search.
    • Deliverable 4 due:Plot the final results on map.
  • Report
    • CS 298 Report

Innovations and Challenges

  • Deliverable 2 is challenging as it requires getting a successful crawl of the planet.osm archives so it can be used to retrieve local geographical results.
  • Deliverable 3 is challenging as it requires extensive research and coming up with an effective algorithm to rank search results.


[Wiki2011]Planet osm-Wikipedia. Retreived April 6,2011 from Wikipedia web page: http://wiki.openstreetmap.org/wiki/Planet.osm

[WISE2000]Yates,J.D and Xiaofang Zhou.Searching the web using the map. Retrieved from Proceedings of first International Conferance on Web Information Systems Engineering.19-June 2000.

[DEXA2010]Stefan Dlugolinsky and Michal Laclavik and Ladislav Hluchy (2010).Towards a search system for the web exploiting spatial data of a web document.