Wednesday, December 11, 2013

CoNEXT'13: Per-packet load balanced, Low-Latency Routing for Clos-based Data Center Networks

Speaker: Jiaxin Cao, 
Authors: Rui Xia, Pengkun Yang, Chuanxiong Guo, Guohan Lu, Lihua Yuan, Yixin Zheng, Haitao Wu, Yongqiang Xiong, Dave Maltz

This paper introduces DRB for load balancing and low latency communication in Data Centers
Background about Clos-based DCN:
·         Clos-based topologies are fat tree and VL2
·         Routing: equal cost multipath (ECMP)
·         Low network utilization
·         High network latency tail
Network latency measured
·         Busy servers
·         Light servers
·         And all servers
Results show that loaded server doesn’t have contribution to the tail latency.
So where does tail latency come?
·         How to achieve the full bandwidth utilization
·         How to minimize delay
·         Not a new but has not been used
·         Achieve 100% utilization
·         Achieves small queue delay
How to achieve 100% utilization:

  • Spread traffic from one server to another server among all the possible uplink at every layers

Fat tree has enough conditions for DRB
There are some solutions using random bouncing (RB) or round-robin bouncing ( RRB).
Instead we use DRB:

  • For the same pair of i and j server DRB chooses different spine switch to bounce.

We present the queuing latency modelling to show why DRB performs better

  • Results show that DRB and RRB achieve bounded queue length when load achieves 100%.

  • But queue length for RRB is larger.

One issue is DRB cannot directly be applied to VL2.
Solution is virtually split each spine switch to more spine virtual switches
Done simulation for all three and ECMP
The simulation results show improvement compare to RB, RRB and ECMP in all measurements (throughput, queue length re-sequencing delay).
Re-sequencing delay is a time a packet stays in the re-sequence buffer
They did implementation on test bet as well.
DRB queue length is as good as only 2-3 pkts length. 

Q: my biggest concern about this type of work is that you make this assumption that is completely symmetric topologies. If for some problem the bandwidth of one port goes down or using divers type of servers make it reasonable to have completely symmetric topology?
A: in reality switches are different and ports have different bandwidth. Changing the bandwidth is process problem and we disable such ports. The purpose of DRB is to get rid of congestion. If congestion happens at link level because of divers hardware for one link, we handle it with congestion control mechanism.

Q: you can apply DRB to Clos because you can find multiple paths to destination. Suppose there is an arbitrary topology with you can find multiple path in the space, can you still apply DRB on an arbitrary topology like jelly fish?
 A: arbitrary topology can not apply to DRB. Topology was in our assumption.
Q: I missed the first question!!!
Q: the 10 msec is constant or based on the running flow?
A: it is constant. Time out value is usually more than 10 ms so it can be reasonable