Thursday, December 15, 2016

CoNEXT 2016 - Session 7: Datacenter 2

Paper: Sunflow: Efficient Optical Circuit Scheduling for Coflows
Authors: Xin Sunny Huang (Rice University), Xiaoye Steven Sun (Rice University), and T. S. Eugene Ng (Rice University)

In one hand, Optical Circuit Switching (OCS) has many advantages, such as energy efficiency, cost efficiency, and its future-proof capabilities. On the other hand, it has worst traffic performance, especially for small data. This worst performance is related to the need to set up a circuit. In scenarios with large data, its performance may become closer to the one achieve by Packet Switching (PS). The paper seeks to answer the following question: can OCS be as good as packet switching for coflows?

The proposal of Sunflow is to use a not-all-stop switch model (instead of an all-stop model) to allow a more flexible scheduling of coflows. The proposed approach uses a greedy heuristic approach and it is within 2x the optimal. In practice, it is within 1.03x to the optimal. Regarding scheduling of intra-coflows Sunflow not allows subflows to preempt each other, while for inter-coflows it has a flexible preemption policy.

The evaluation was performed through simulations using traces from a Facebook cluster. The results showed that Sunflow is more efficient than Solstice and can achieve near packet switch performance when compared to Varys. In summary, Sunflow achieves the benefits of OCS and good traffic performance for coflows.

Questions:

Q: Is the 2x to the optimal for intra-coflows? What is your intuition about the worst case?

A: Yes. For small coflows performance becomes smaller. but the differences when compared to PS still small.

Q: Evaluation was made through simulations. What are your thoughts about deploying and evaluating in real scenarios?

A: There are other papers that address this question. Our contribution is on the algorithm.

Q: What is your opinion about the drawbacks of OCS?

A: Since OCS achieves many more additional benefits, the drawbacks are tolerable.
________________________________________________________________________
Paper: ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY
Authors: Yibo Zhu (Microsoft Research), Monia Ghobadi (Microsoft Research), Vishal Misra (Columbia University), and Jitendra Padhye (Microsoft Research)

Datacenter applications demand high bandwidth, low latency, and low CPU overhead. The TCP stack, however, is too heavyweight. RDMA have being employed in the real world to offload the NIC. Recently, DCQCN and TIMELY were proposed to improve the congestion control. This paper aims to answer the following question: which is better for congestion control?

While DCQCN achieves the desired properties (fairness, fast convergence, stability, high link utilization, and reduction in flow completion time), TIMELY does not. The reason is related to the use of the derivative of latency to change the rate. To deal with this situation, this paper proposes a quick patch for TIMELY. The patched version of TIMELY changes the rates based on absolute delay and on the derivative of delay.

Comparing the three solutions, DCQCN obtained better results than both versions of TIMELY. The patched version of TIMELY outperformed the original one. As the workload increases, the results of DCQCN become even better than the other approaches. The main reason for the performance difference is related to the mechanism used to detect the congestion. While DCQCN uses ECN, TIMELY is based on end-to-end delays. The conclusion is that the use of ECN is better because on delay-based approaches is not possible to have fixed queue length and fairness simultaneously. In summary, ECN is probably a better signal than end-to-end delay. However, the use of end-to-end delay does not need support by the switches.

Questions:

Q: Have you evaluated the fairness in scenarios where there are a variety of end-to-end delays?

A: No. The focus of the work is on datacenter, where the queueing delay is dominant.

Q: Have you considered the existence of multiple bottlenecks per flow?

A: Not yet.

Q: Have you tested with different flow sizes?

A: Not yet.
________________________________________________________________________
Paper: Composite-Path Switching
Authors: Shay Vargaftik (Technion), Katherine Barabash (IBM Research), Yaniv Ben-Itzhak (IBM Research), Ofer Biran (IBM Research), Isaac Keslassy (Technion), Dean Lorenz, (IBM Research), and Ariel Orda (Technion)

Datacenters are requiring lower latencies and higher bandwidths. How is it possible to achieve these demands? First, it is necessary to understand the current scenario. Today, we have lots of racks with small demands and several racks with lots of traffic. DCN traffic patterns can be classified in many-to-many, one-to-one, one-to-many, and many-to-one. The challenge is how to deal with the last two. This paper proposes a Hybrid Switching model, which combines EPS and OCS.

In order to avoid performance degradation, the one-to-many traffic should have OCS for the sender and EPS for the receivers. In the many-to-one case, the composition should be EPS for the senders and OCS for the receiver. Two main challenges should be tackled: how to represent composite paths and what to serve using composite paths. The first one is treated with an augmented demand matrix. The second one is achieved through the recognition of the different paths and its properly scheduling. The scheduling works in steps. First, a reduction process is applied on the demands and on the switch parameters. Second, the new demands and the hybrid-switching parameters are passed to the scheduler. Finally, the interpretation process schedules the flows. In summary, we proposed an approach that can accommodate more traffic patterns without increasing the scheduling complexity.

Questions

Q: It appears that the OCS and EPS will share the same path. Thus, the OCS capacity will be limited by the EPS. What is your position?

A: I do not see this limitation. There are extra high bandwidth ports to avoid this.

Q: How recently are you traces?

A: They are of 2014, maybe 2015.

Q: Does the larger piece of traffic originated by many-to-many communication? Is it possible to use the composed path for this traffic?

A: Yes. But we do not want to overhead the composed path.