Thursday, December 12, 2013

Low latency via redundancy

PresenterAshish Vulimiri
Authors: Ashish Vulimiri (UIUC), Brighten Godfrey (UIUC), Radhika Mittal (UC Berkeley), Justine Sherry (UC Berkeley), Sylvia Ratnasamy (UC Berkeley), Scott Shenker (UC Berkeley and ICSI)

The authors explore the use of redundancy to lower latency. The idea is to send multiple requests and use the first response; this can tame tail latency by avoiding failed/slow servers.

They perform case studies:

1) DNS. Experiments on PlanetLab. Clients issue multiple copies (to different servers) of each DNS query. 99th percentile response time improves by .44 seconds when contacting just 2(?) servers. Improvement is larger in the tail than the mean.

2) Distributed K-V store. Experiments on EC2 and Emulab. 2 copies --> improves mean by 1.5x, 99th percentile by 2.2x on Emulab. Even bigger improvement on EC2!

3) Memcached on Emulab: redundancy didn't help because variance in response time was very small to begin with.

Next the authors step back and propose a model for distributed applications to help decide whether or not redundancy will help a particular system. (You need to balance two factors: is the decrease in variability worth the increase in load?)

Another interesting result: in data centers where switches can support QoS, marking redundant packets as low priority completely eliminates the negative effects of the increased load.


Q: Redundancy only helps if system is over-provisioned/under-utilized. What real-world systems are over-provisioned?
A: Many data center applications, e.g., K-V stores, see bursty workloads. During bursts, you can throttle back the redundancy.

Q: You've used redundancy for DNS --- could you use it for web browsing?
A: It's hard to say without a root cause analysis of latency in web browsing. If the latency is caused by full buffers in the network, this would only work if you had, e.g., geographically distributed web servers.

Q: What if you get different answers? What if response quality is more important than latency?
A: Yes, we're working on quantifying this. This could certainly be the case for DNS. Informally, what we've seen suggests that this is only a problem for a small number of web sites.

Q: How could the client decide automatically how many copies of a request to send?
A: We don't have an answer at this point. If the server could tell the client its current load, that could help. It's difficult to do with only client-side measurements --- it's an interesting problem.

Q: Are you concerned about duplicate requests increasing system variability?
A: It depends very much on the specifics of the system --- I don't think there's a general answer, at least not one I could give you at this point.