Thursday, August 20, 2015

Congestion Control for Large-Scale RDMA Deployments

Yibo Zhu (Microsoft) (U.C. Santa Barbara), Haggai Eran (Mellanox), Daniel Firestone (Microsoft), Chuanxiong Guo (Microsoft), Marina Lipshteyn (Microsoft), Yehonatan Liron (Mellanox), Jitendra Padhye (Microsoft), Shachar Raindel (Mellanox), Mohamad Haj Yahia (Mellanox), Ming Zhang (Microsoft).

Paper:
http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p523.pdf

Public review:
http://conferences.sigcomm.org/sigcomm/2015/pdf/reviews/204pr.pdf

The goal of a network in a datacenter environment is to provide ultra-low latency, high throughput, and low CPU overhead. Current TCP/IP stack is too heavy. Remote Direct Memory Acess (RDMA) is another network paradigm where data are transferred from network interface cards (NICs) using a read/send memory commands that bypasses OS. Priority-based Flow Control (PFC) is used to avoid buffer overflow at switches, but it might lead to poor performance.

Authors developed a congestion control algorithm for RDMA system called Datacenter QCN (DCQCN). It provides end-to-end congestion control. They control switch buffers to avoid PFC. ECN measures per-switch and per-priority queue length. It is a rate-based congestion control scheme. DCQCN keeps PFC and uses ECN and hardware rate-based control. The authors present a fluid model to justify their parameter selection.

DCQCN provides higher throughput and lower instantaneous queue length than NO DCQCN. During presentation, author presented some comparison results between DCQCN and DCTCP which was not originally in the paper but is important. Author claimed DCQCN is more fine control than DCTCP since it is rate-based instead of windows-based.

Q&A
Q1: What in your design is specific?
A: It is deployed in a data center network.

Q2: Did you compare with QCN?
A: Only in L2 domain.
Need to modify NIC ASIC.

Q3: Google says ECN is poor. Use TIMELY. Any comments on that?
A: TIMELY does not guarantee that it will not trigger PFC.