Wednesday, August 23, 2017

Paper 3: Re-architecting datacenter networks and stacks for low latency and high performance

Paper 3: "Re-architecting datacenter networks and stacks for low latency and high performance" - authors are Mark Handley (University College London), Costin Raiciu, Alexandru Agache, and Andrei Voinescu (University Politehnica of Bucharest), and Andrew Moore, Gianni Antichi, and Marcin Wójcik (University of Cambridge)

The paper is presented by Prof. Mark Handley (University College London). This paper has received a SIGCOMM 2017 Best Paper Award.

The talk starts by describing three goals in DC networks:
  1. low latency between hosts
  2. receiver prioritisation
  3. predictable high throughput
Although, achieving these simultaneously is challenging. The authors present NDP, a DC protocol architecture that simultaneously achieves low latency and high throughput.

One of the key ideas in this paper is the use of packet trimming when a switch queue fills up. This idea proposed as Cut Payload (CP) in this paper (worth checking out):
P. Cheng, F. Ren, R. Shu, and C. Lin. Catch the whole lot in an action: Rapid precise packet loss noti cation in data centers. In Proc. Usenix NSDI, 2014.

In NDP, the authors addressed the shortcomings of CP  with the following changes:

  1. An NDP switch maintains separate queues for data packets and higher priority trimmed header packets. This provides low latency similar to lossless Ethernet, without the collateral damage caused by pausing.
  2. An NDP switch performs weighted round robin between the high and low priority queues. This can eliminate congestion collapse.
  3. An NDP switch decides wether to trim a newly arrived packet or the data data packet at the end of the low priority queue. This breaks up phase effects.

On top of presenting a switch queuing algorithm, NDP proposes a per-packet multipath forwarding and a novel transport protocol.

Implementation: the authors presented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4.


  • NDP provides very low flow completion time.
  • NDP provides much better isolation between different workloads than mechanisms such as DCQCN (that rely on lossless Ethernet for low delay).
  • Incast: While other solutions have also tackled this problem well (see figure below), the important part is that NDP solves the incast problem using only 8 packet buffers.

You can check out the great animations that were presented during the talk here.

The talk was followed by a lively Q&A, with questions on fairness (NDP ensures fairness), network topology (NDP requires a Clos topology) and failure awareness (NDP handles link failures by maintaining scores for links).