Tuesday, October 28, 2014

Session 3, Paper 2: Characterizing Load Imbalance in Real-World Networked Caches

Authors: Qi Huang (Cornell University), Helga Gudmundsdottir, Ymir Vigfusson (Reykjavik University), Daniel Freedman (Technion), Ken Birman, Robbert van Renesse (Cornell University)

Link to paper Characterizing Load Imbalance in Real-World Networked Caches
Modern web services nowadays heavily rely on the in-memory caches to reduce the request latency for users and reduce the load on the back-end storage servers. To avoid overloading a single cache server, data is partitioned into multiple segments (“shard”) and stored across different cache servers. Related objects tend to be mapped into the same shard; and ideally, each cache server should see similar request rates. However, in the real-world workload, we often observe the load imbalance among the cache servers. Potential causes of load imbalance include: hashing schemes, skewed access popularity, etc. Using real workload of Facebook’s TAO cache system, this paper tries to answer the following questions:
  • What is the state of load imbalance?
  • What contributes to load imbalance?
  • How effective are current techniques?
  • How to improve?


The authors find that significant load imbalance observed in TAO. Possible causes:
  • Content popularity (ranking top 100 shards, and top 20 objects per server)
  • Hot objects: (Interestingly) very hot hot object alone are not a major cause of load imbalance of cache server.
  • Hot shards: Popular shards have much more traffics than the hot object


Next, the presenter evaluates the current techniques used to mitigate the load imbalance (consistent-hashing, hot-content replication) and find room for improvement. 3 approaches for consistent hashing: consistent hashing for in-memory networked caches implemented in libketama, TAO’s hashing, and baseline “perfect hashing”. 3 approaches for replication: None (no replication), TAO’s replication, optimal replication. The combination of these (hashing,replication) are evaluated using Facebook’s TAO workload. Metric used: max/avg, where max,avg are volume of requests of most loaded/average loaded servers). Some key results:
  • standard hashing (implemented in libketama): 53% higher load on the most loaded server compared to the average. Only achieve 41% higher load when implemented with optimal replication.
  • TAO’s hashing in general performs better than standard hashing, but still worse than when optimal hashing or replication is implemented in TAO
  • The streaming algorithm outperforms other replication approaches, but still has room for further improvement.

Future work:
  • Characterize how hashing and replication affect load imbalance?
  • Can streaming algorithm replicate content before its popularity surges?
  • Can we predict the popularity spikes and prevent hotspots.