Thursday, August 20, 2015

R2C2: A Network Stack for Rack-scale Computers

This paper was presented in the "Congestion Control and Transport Protocols" session at Sigcomm 2015, London. The authors for the paper are : Paolo Costa (Microsoft Research), Hitesh Balani (Microsoft Research), Kaveh Razavi (VU Amsterdam), Ian Kash (Microsoft Research).

Paolo Costa presented the paper. The links to the paper and public review are here : PaperPublic review

Paolo started by introducing Rack-scale computers and how existing tree-based network topology is insufficient for a rack with 1000 compute entities. For rack-scale computing a distributed switching fabric is required; however, it introduces multiple issues for routing and congestion control.

1. Per flow routing: How to select the best routing, which can be non-minimal favoring high throughput over latency ?
2. How to choose best routing for different work loads?
3. How to monitor ECN and RTT for multiple paths; given the number of options for paths now way more than the number in standard topologies.

Paolo then presents a solution which advocates source control; each node broadcasts its work loads and route to the other servers and each one then accordingly, adjusts their rate of traffic flow. Paolo spots the "I do not believe it!!" look in the audience faces, and proceeds to outline a few optimizations that help in computing source rate and routing per flow in the order of milliseconds. They are:

1. Decoupling route and rate completion: route completion happens at a coarser time scale.
2. Rate is calculated per flow as opposed to per sub-flow; could lead to possible under utilization of network.
3. Allow head room for handling bursty nature of traffic.

Paolo concluded the talk by discussing the overhead of the "broadcast messages", which for small flows could theoretically be as high as 26%, but for typical data center traffic is expected to be around 3%.