The modern datacenter network is increasingly an interconnect for distributed compute workloads supporting clustered applications such as web services, big data analytics, HPC, and monitoring systems. The user perceived performance of these applications depends in large part on the latency of network transfers or flows. Hence, traditional communication metrics such as throughput and fairness are not particularly relevant in this environment and it is more important to orchestrate the network resource allocation to finish flows quickly in order to meet overall computing objectives.
This paper presents pFabric, a new datacenter transport design that provides near theoretically optimal flow completion times (even at the 99th percentile for short flows). pFabric delivers this performance with a very simple design that is based on a key conceptual insight: datacenter transport should decouple flow scheduling from rate control. For flow scheduling, packets carry a single priority number set independently by each flow; switches have very small buffers and implement a very simple priority-based scheduling/dropping mechanism. Rate control is also correspondingly simpler; flows start at line rate and throttle back only under high and persistent packet loss. We provide theoretical intuition and show via extensive simulations that the combination of these two simple mechanisms is sufficient to provide near-optimal performance.