Chris Pollett > Students >
Padia

    ( Print View)

    [Bio]

    [Blog]

    [CS297 Proposal]

    [CS 297 Report PDF]

    [CS298 Proposal]

    [CS298 Report PDF]

    [CS298 Oral Defence Slides PDF]

    [Deliverable 1: CacheRefresh MediaJob]

    [Deliverable 2: Implement MLDC Algorithm]

    [Deliverable 3: Implement STDC Algorithm]

    [Deliverable 4: Implement SSDC Algorithm]

    [Understanding Yioop PDF]

    [Scalability Challenges PDF]

    [Cache Aware strategies PDF]

    [ML Based Cache Algorithm PDF]

    [Static Topic Dynamic Cache PDF]

    [Static Semi-Static Dynamic Cache PDF]

    [Query Statistics]

Deliverable 4
Implement Static Semi-Static Dynamic Cache

Purpose
The goal of this deliverable is to implement Static-Semi-Static-Dynamic Cache (SSDC) "Exploiting temporal changes in query submission behavior for improving the search engine result cache performance"

Description

The previous paper STDC used dynamic cache to capture the temporal locality in cache while this approach uses semi-static cache to capture the temporal locality. The key observation the paper relied on was that daytime queries have higher submission rate during day while nighttime queries have higher submission rate during night.

To capture the temporal query frequency variations, new attributes called normalized temporal query frequency (NT-Freq) attributes were added to the queries. NT-Freq values are hourly submission rate of the query i.e., total frequency of the query during a period divided by number of hours in the period. NT-Freq attribute for daytime hours is calculated as total frequency of query between 07:00 to 19:00 divided by 12. Similarly, nighttime NT-Freq is calculated as frequency during other time of the day divided by 12. All time NT-Freq is calculated as total frequency in period of 24 hours divided by 24.

SSDC utilizes semi-static cache along with traditional SD cache. The semi-static cache contains daytime popular and nighttime popular queries corresponding to the time of the day. Daytime and nighttime popular queries are calculated using training data. For this deliverable, training was performed using 1.5 million queries and is tested on other 1.5 million queries with cache size of 300. The cache size has been increased as SSDC approach is designed for large caches only. (Code: SemiStaticTraining, Cache Simulator Zip)


Results
Result of SSDC algorithm
Cache SizeConfigurationHit Rate
300SDC(80-20)56.65%
300SSDC(10-70-20)55.76%
300SSDC(20-60-20)55.95%
300SSDC(40-40-20)56.21%
300SSDC(70-10-20)56.56%
300SSDC(0-80-20)55.46%

Conclusion
The hit rate of the algorithm is 1.2% lower than the SDC algorithm. It less than the performance of STDC and has additional constraint of having large static cache. The algorithm is easy implement and has a considerable performance but has a very little scope for fine tuning as most of the parameters are fixed. Hence, it might not be considered in the next semester.