CS297 Proposal
Robust Cache System for Yioop
Rushikesh Padia (padiarushi3012@gmail.com)
Advisor: Dr. Chris Pollett
Description:
Yioop is an open search engine that allows user to create indexes on websites. It has distributed crawlers to crawl open internet and index web pages.
It also allows users to add their own websites. Currently, Yioop uses a simple query caching mechanism based on expiration time to invalidate the stale cache.
This mechanism can be substituted with other state-of-the-art approaches to improve speed and precision.
The goal of the project is to implement a cache management system for improving search results of the Yioop search engine.
The cache system will be responsible for the efficient storage and retrieval of search query results.
Schedule:
Week 1: Aug 23 - Aug 30 | Finalize project topic and decide on deliverables |
Week 2: Aug 30 - Sep 6 | Understand web search engines and Yioop. (Read paper[1] - Scalability Challenges in Web Search Engines) |
Week 3: Sep 6 - Sep 13 | Start on Deliverable 1 of adding media job. (Read documents Yioop docs and resource-https://www.seekquarry.com/p/Ranking) |
Week 4: Sep 13 - Sep 20 | Complete Deliverable 1 |
Week 5: Sep 20 - Sep 27 | Research web search engine cache management systems, (Read paper [2] - Cost-Aware Strategies for Query Result Caching in Web Search Engines |
Week 6: Sep 27 - Oct 4 | Research and finalize 3 cache replacement algorithms |
Week 7: Oct 4 - Oct 11 | Start working on Deliverable 2 of implementing MLDC algorithm (Read paper [3] - "A machine learning approach for result caching in web search engines") |
Week 8: Oct 11 - Oct 18 | Continue working with Deliverable 2 |
Week 9: Oct 18 - Oct 25 | Complete Deliverable 2 |
Week 10: Oct 25 - Nov 1 | Start working on Deliverable 3 of implementing STD algorithm(Read paper [4] - "Topical result caching in web search engines") |
Week 11: Nov 1 - Nov 8 | Continue working with Deliverable 3 |
Week 12: Nov 8 - Nov 15 | Complete Deliverable 3 |
Week 13: Nov 15 - Nov 22 | Start working on Deliverable 4 of implementing PESOS algorithm (Read paper [5]- "Exploiting temporal changes in query submission behavior for improving the search engine result cache performance") |
Week 14: Nov 22 - Nov 29 | Continue working with Deliverable 4 |
Week 15: Nov 29 - Dec 6 | Complete Deliverable 4 |
Week 16: Dec 6 - Dec 13 | Work on final report |
Deliverables:
The full project will be done when CS298 is completed. The following will
be done by the end of CS297:
1. Understand Yioop and add new media job to run queries to populate caches
2. Implement MLDC algorithm [3]
3. Implement STDC algorithm [4]
4. Implment SSDC algorithm [5]
5. CS 297 Report.
References:
[1] "B. Cambazoglu and R. Baeza-Yates, "Scalability Challenges in Web Search Engines," in Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 7, 2011, pp. 27-50. doi: 10.1007/978-3-642-20946-8_2."
[2] R. Ozcan, I. S. Altingovde, and A. Ulusoy, "Cost-Aware Strategies for Query Result Caching in Web Search Engines," ACM Trans. Web, vol. 5, no. 2, May 2011, doi: 10.1145/1961659.1961663.
[3] T. Kucukyilmaz, B. B. Cambazoglu, C. Aykanat, and R. Baeza-Yates, "A machine learning approach for result caching in web search engines," Information Processing & Management, vol. 53, no. 4, pp. 834-850, 2017, doi: https://doi.org/10.1016/j.ipm.2017.02.006.
[4] I. Mele, N. Tonellotto, O. Frieder, and R. Perego, "Topical result caching in web search engines," Information Processing & Management, vol. 57, no. 3, p. 102193, 2020, doi: https://doi.org/10.1016/j.ipm.2019.102193.
[5] T. Kucukyilmaz, "Exploiting temporal changes in query submission behavior for improving the search engine result cache performance," Information Processing & Management, vol. 58, no. 3, p. 102533, 2021, doi: https://doi.org/10.1016/j.ipm.2021.102533.
[6] H. Ma, O. Tao, C. Zhao, P. Li, and L. Wang, "Impact of replacement policies on static-dynamic query results cache in web search engines," in 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 2017, pp. 137-139. doi: 10.1109/ISI.2017.8004890.
[7] R. Solar, V. Gil-Costa, and M. Marin, "Evaluation of Static/Dynamic Cache for Similarity Search Engines," in SOFSEM 2016: Theory and Practice of Computer Science, 2016, pp. 615-627.
[8] R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza, "Caching Search Engine Results over Incremental Indices," in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010, pp. 82-89. doi: 10.1145/1835449.1835466.
[9] T. Trinh, D. Wu, and J. Z. Huang, "C3C: A New Static Content-Based Three-Level Web Cache," IEEE Access, vol. 7, pp. 11796-11808, 2019, doi: 10.1109/ACCESS.2019.2892761.
|