Saturday, November 23, 2013

HotNets' 13: How to Improve Your Network Performance by Asking Your Provider for Worse Service

Presented by: Radhika Mittal, UC Berkeley

This paper makes two main observations to motivate its designs. First, network providers typically over-provision their networks (for resiliency). Second TCP's congestion control is cautious and ramps up slowly in Slow Start. These two behaviors interact adversely, and in an Internet where most flows do not leave slow start, waste capacity.

To address this, the authors step back and look at the goals of congestion control: To fill the pipe for high throughput, and to do no harm to other flows. Traditionally, a single mechanism achieves these two conflicting goals. Their system (RC'3)'s key insight is decoupling the two: Run regular TCP at high priority, and fill up the pipe at a lower priority. They call this worse quality of service (WQoS) and claim that the already-present priority queuing mechanisms in switches can be re-purposed for this task, by providing several layers of worse quality of service.

Next, they develop an idealized mathematical model to analyze the performance gains of RC3 relative to vanilla TCP as a function of flow size. Broadly, for flow sizes less than the initial window, RC3 gives no gains. For flow sizes between the initial window and the bandwidth-delay product, RC3's gains increase monotonically and then start falling off beyond the bandwidth-delay product. They also observe that the maximum possible gain increases with increasing bandwidth-delay product, making it future-proof.

RC3's design has two control loops: the standard TCP control loop, and RC3's own control loop. RC3 tries to minimize overlap between packets sent by the two control loops by transmitting the flow's packets in reverse order for RC3's control loop, and in the standard forward order for TCP's control loop. To make this scheme feasible with a fixed number of priority levels, RC3 transmits (say) 4 packets at priority level 1, 40 at priority level 2, 400 at level 3, and so on, until it crosses over into the standard loop's transmissions. Furthermore, to ensure flow-level fairness, every flow gets to transmit exactly the same number of packets at each priority level, ensuring that long flows can't squeeze out the shorter ones. They leave loss recovery to TCP's loss recovery mechanism, but require SACK to selectively acknowledge packets that are received through RC3's control loop.

Their evaluations show that their performance gains track their theoretical model well. The authors also compare RC3 with increasing the Initial Window, RCP, and traditional QoS and see that it improves flow completion times in all cases.

Q&A

1. How is this different from pFabric?

Ans: They focused their design largely on the data-center context. Our gains, on the other hand, are much more pronounced in wide-area settings with high bandwidth-delay products, and much lesser
in the data-center context due to smaller bandwidth-delay products.

2. Does the application set priorities for the packets?

Ans: No, the OS does this in the kernel. It happens automatically and it ensures that the longer flows do not starve the shorter ones.

3. Floyd and Allman proposed a similar mechanism a while back. They had trouble implementing this using the scheduler mechanisms in Linux.

Ans: We implemented it in Linux and it worked fine for us.

4. What about per-packet delay? Since RC3 sends the initial window + 4 + 40 + ... packets in one burst, doesn't it lead to large per-packet delays for other flows that are not interested in flow completion times, but individual per-packet delays?

Ans: The flows interested in  per-packet delay would be sent at the same priority as standard TCP. Hence their packets would pre-empt packets that are sent at the lower priority levels by RC3, and would only compete with packets sent by the standard TCP control loop, just like they would today.

5. Did you explore any congestion control for the RC3 loop itself? Instead of sending all the packets at any priority level all at once?

Ans: No. We stuck with this idea because it was simple and worked.

6. Let's say two users want to start a connection simultaneously. Does RC3 give any gains to either user over standard TCP?

Ans: RC3 doesn't do any worse than TCP in the worst case. But if there is enough spare capacity for both users to fill the pipe quickly, RC3 will allow both flows to complete faster than standard TCP.