Friday, November 22, 2013

HotNets'13: Towards Minimal-Delay Deadline-Driven Data Center TCP

Li Chen, Shuihai Hu, Kai Chen (HKUST), Haitao Wu (Microsoft Research Asia), Danny H.K. Tsang (HKUST).

Paper: http://conferences.sigcomm.org/hotnets/2013/papers/hotnets-final92.pdf

Data center workloads have flows with diverse deadlines as recognized by earlier work such as D3, D2TCP, PDQ, and pFabric.  Some of these approaches are ad-hoc: For instance, DCTCP maintains shallow queues which indirectly affects flow completion times; while others require intrusive hardware changes (e.g. D3 and pFabric).  This paper presents MCP, which is an end-to-end congestion control algorithm to determine the "right" rates to meet flow deadlines, while being readily deployable.

The key idea in the paper is formulating the problem explicitly as a stochastic network optimization problem and derived an end-to-end scheme using a standard technique called "drift plus penalty method."  See paper for more details.

What I think is interesting about the paper is that the authors formulated the problem explicitly and derived the end-to-end window update algorithm to achieve the optimal rates needed to meet as many flow deadlines as possible.

Q: You are adapting rate over time and starting rate.  Have you quantified the benefit of each modification's?  How much of the benefit is just due to the right starting rate?
A: It has to start at expected rate, or dynamics would take a long time to converge.  If flows just stick to starting rate, the network will be unstable.

Q: Is the objective to maximize the number of deadlines met?
A: No, it is to minimize per-packet delays.

Q: Recent works look at flow-level metrics (mean FCT, etc.).  What impact do these metrics have on application performance (e.g. MapReduce)?
A: Most work in this area is on flow-level performance, so we used the same.  For specific applications, I think there is room for improvement.

Q: In your graphs, what is "optimal"?  Why is it different from the line labeled "throughput"?
A: Optimal is computed centrally using per-hop information.  It is hard for a stochastic program to always operate at the optimal point -- we just proved convergence to optimality in the stochastic sense.