Chris Pollett > Students >
Padia

    ( Print View)

    [Bio]

    [Blog]

    [CS297 Proposal]

    [CS 297 Report PDF]

    [CS298 Proposal]

    [CS298 Report PDF]

    [CS298 Oral Defence Slides PDF]

    [Deliverable 1: CacheRefresh MediaJob]

    [Deliverable 2: Implement MLDC Algorithm]

    [Deliverable 3: Implement STDC Algorithm]

    [Deliverable 4: Implement SSDC Algorithm]

    [Understanding Yioop PDF]

    [Scalability Challenges PDF]

    [Cache Aware strategies PDF]

    [ML Based Cache Algorithm PDF]

    [Static Topic Dynamic Cache PDF]

    [Static Semi-Static Dynamic Cache PDF]

    [Query Statistics]

CS297 Proposal

Robust Cache System for Yioop

Rushikesh Padia (padiarushi3012@gmail.com)

Advisor: Dr. Chris Pollett

Description:

Yioop is an open search engine that allows user to create indexes on websites. It has distributed crawlers to crawl open internet and index web pages. It also allows users to add their own websites. Currently, Yioop uses a simple query caching mechanism based on expiration time to invalidate the stale cache. This mechanism can be substituted with other state-of-the-art approaches to improve speed and precision. The goal of the project is to implement a cache management system for improving search results of the Yioop search engine. The cache system will be responsible for the efficient storage and retrieval of search query results.

Schedule:

Week 1: Aug 23 - Aug 30Finalize project topic and decide on deliverables
Week 2: Aug 30 - Sep 6Understand web search engines and Yioop. (Read paper[1] - Scalability Challenges in Web Search Engines)
Week 3: Sep 6 - Sep 13Start on Deliverable 1 of adding media job. (Read documents Yioop docs and resource-https://www.seekquarry.com/p/Ranking)
Week 4: Sep 13 - Sep 20Complete Deliverable 1
Week 5: Sep 20 - Sep 27Research web search engine cache management systems, (Read paper [2] - Cost-Aware Strategies for Query Result Caching in Web Search Engines
Week 6: Sep 27 - Oct 4Research and finalize 3 cache replacement algorithms
Week 7: Oct 4 - Oct 11Start working on Deliverable 2 of implementing MLDC algorithm (Read paper [3] - "A machine learning approach for result caching in web search engines")
Week 8: Oct 11 - Oct 18Continue working with Deliverable 2
Week 9: Oct 18 - Oct 25Complete Deliverable 2
Week 10: Oct 25 - Nov 1Start working on Deliverable 3 of implementing STD algorithm(Read paper [4] - "Topical result caching in web search engines")
Week 11: Nov 1 - Nov 8Continue working with Deliverable 3
Week 12: Nov 8 - Nov 15Complete Deliverable 3
Week 13: Nov 15 - Nov 22Start working on Deliverable 4 of implementing PESOS algorithm (Read paper [5]- "Exploiting temporal changes in query submission behavior for improving the search engine result cache performance")
Week 14: Nov 22 - Nov 29Continue working with Deliverable 4
Week 15: Nov 29 - Dec 6Complete Deliverable 4
Week 16: Dec 6 - Dec 13Work on final report

Deliverables:

The full project will be done when CS298 is completed. The following will be done by the end of CS297:

1. Understand Yioop and add new media job to run queries to populate caches

2. Implement MLDC algorithm [3]

3. Implement STDC algorithm [4]

4. Implment SSDC algorithm [5]

5. CS 297 Report.

References:

[1] "B. Cambazoglu and R. Baeza-Yates, "Scalability Challenges in Web Search Engines," in Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 7, 2011, pp. 27-50. doi: 10.1007/978-3-642-20946-8_2."

[2] R. Ozcan, I. S. Altingovde, and A. Ulusoy, "Cost-Aware Strategies for Query Result Caching in Web Search Engines," ACM Trans. Web, vol. 5, no. 2, May 2011, doi: 10.1145/1961659.1961663.

[3] T. Kucukyilmaz, B. B. Cambazoglu, C. Aykanat, and R. Baeza-Yates, "A machine learning approach for result caching in web search engines," Information Processing & Management, vol. 53, no. 4, pp. 834-850, 2017, doi: https://doi.org/10.1016/j.ipm.2017.02.006.

[4] I. Mele, N. Tonellotto, O. Frieder, and R. Perego, "Topical result caching in web search engines," Information Processing & Management, vol. 57, no. 3, p. 102193, 2020, doi: https://doi.org/10.1016/j.ipm.2019.102193.

[5] T. Kucukyilmaz, "Exploiting temporal changes in query submission behavior for improving the search engine result cache performance," Information Processing & Management, vol. 58, no. 3, p. 102533, 2021, doi: https://doi.org/10.1016/j.ipm.2021.102533.

[6] H. Ma, O. Tao, C. Zhao, P. Li, and L. Wang, "Impact of replacement policies on static-dynamic query results cache in web search engines," in 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 2017, pp. 137-139. doi: 10.1109/ISI.2017.8004890.

[7] R. Solar, V. Gil-Costa, and M. Marin, "Evaluation of Static/Dynamic Cache for Similarity Search Engines," in SOFSEM 2016: Theory and Practice of Computer Science, 2016, pp. 615-627.

[8] R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza, "Caching Search Engine Results over Incremental Indices," in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010, pp. 82-89. doi: 10.1145/1835449.1835466.

[9] T. Trinh, D. Wu, and J. Z. Huang, "C3C: A New Static Content-Based Three-Level Web Cache," IEEE Access, vol. 7, pp. 11796-11808, 2019, doi: 10.1109/ACCESS.2019.2892761.