Wednesday, August 24, 2016

Scheduling Mix-flows in Commodity Datacenters with Karuna
Presenter: Li Chen
Co-authors: Kai Chen, Wei Bai, Mohammad Alizadeh

The key problem that the authors are trying to solve is how to schedule a mixture of deadline and non-deadline flows such that we meet most of the deadlines for the deadline sensitive flows and at the same time ensure that the flow completion times for non-deadline flows is low. The authors argue that the current solutions tend to optimize for one type of flows while hurting the other type, for e.g. SJF improves flow completion times for non-deadline flows but increases the deadline miss rate for deadline sensitive flows when the flow sizes for such flows are large and on the other end of the spectrum EDF improves the deadline meet rate, but also increases the flow completion times for non-deadline flows due to its use of strict prioritization.

The key insight that the paper uses is to ensure that the deadline sensitive flows just meet their deadlines rather than finishing very early, and use the bandwidth left to distribute among the non-deadline flows. Implementation uses a multi-level priority queue, where deadline sensitive flows are put in the highest priority queue and non-deadline flows are put in lower priority flows (using SJF as the priority scheme). To account for non-deadline flows with unknown size, they use PIAS to decide which priority queue to put the flows in.

The evaluation was done using a small testbed of 16 servers and large scale ns3 simulations. The results show that the system can achieve lower flow completion times for non-deadline flows by up to 47.78% at heavy load as compared to pFabric while at the same time maintaining low (<5.8%) deadline miss rates.

Q: How long does large scale simulations take? Is your simulation code open source?
A: It takes a long time. Kernel module is open source, we will make simulation code open source sometime in the future.

Q: Why not just use WFQ?
A: The demand for deadline traffic is dynamic, so we use dynamic WFQ.

Q: Do you differentiate between small deadlines and large deadlines?
A: We use whatever deadline values given by the application, and do not make any specific differentiation.

Q: Did you measure avg flow completion time for deadline flows? Is it worse than traditional solutions?
A: We trade-off avg flow completion time for deadline flows for smaller tail latency of non-deadline flows.

Q: Will your algorithm work for fat-tree topology?
A: It should work. Queue estimation might be an issue but we can change the constant parameter in the queue estimation part of the algorithm to accommodate for topology changes.

Q: Since you are pushing very close to the deadlines for the deadline sensitive flows, do you take into account bursty deadline traffic or transient failures to make sure that even in these situations we do not miss deadlines?
A: Karuna is by design not optimal for deadline flows. We make a trade-off to account for non-deadline flows.