Friday, August 25, 2017

Section 9 - Realities, Paper 4: Carousel: Scalable Traffic Shaping at End Hosts

Presenter: Ahmed Saeed

Authors: Ahmed Saeed(GIT), Nandita Dukkipati(Google, Inc.), Vytautas Valancius(Google, Inc.), Vinh The Lam(Google, Inc.), Carlo Contavalli (Google, Inc.),  Amin Vahdat(Google, Inc.)

Network bandwidth is a constrained resource that is expensive to overprovision especially across the WAN. Accurate shaping to a target rate is increasingly important to efficient network operation.
Ahmed present traffic shaping, and traditionally, network switches and routers have implemented it. The problem is that shaping in middleboxes is not an easy option inside a datacenter. It is expensive in buffering and latency, and middleboxes lack the necessary state to enforce the right rate. Shaping in the middle does not help when bottlenecks are at network edges. So new traffic shapers that can handle tens of thousands of flows and rates are needed.

Ahmed , et cl. propose Carousel which is an improvement on existing, kernel-based traffic shaping mechanism. The main idea is to replace many queues with a single low-overhead queue. Carousel scales to tens of thousands of flows and traffic classes, and supports complex bandwidth-allocation mechanisms for both WAN and data-center communications. He shows the Carousel architecture.
Ahmed present their evaluation setup:
1.Carousel deployed within a Software INC.
2.Evaluated on Youtube servers comparing Carousel and FQ/Pacing.
3.Each server handles up to 50k sessions concurrently.
Carousel can save up to 8.2% of overall CPU utilization(5.9 cores on a 72 core machine).
Carousel improves even Software NIC utilization by 12% by increasing size of batches of packets enqueue in the software NIC.

Carousel allows networks operators for the first time to shape tens of thousands of flows individually. Carousel advantages make a strong case for providing single-queue shaping and backpressure in kernel, userspace stacks, hypervisors, and hardware.

Paper is here.

Q&A section:

Q1: I care about the data structure you are using, seems to be a trade-off, where you either have the granularity of pacing, or the number of packets. Have you quantified for example how many packets you have, the granularity and processing, that is very important for the products?
A1: We did consider all the parameters and through our evaluation, we found that with the bucket size 8000, we can support other traffic for Youtube.

Q2: I am wondering if your algorithm supports flexible scheduling.
A2: That is a good problem we are currently working on. We are designing a new algorithm to support arbitrary scheduling of the software.

Q3: Why should we have software rate limiters as opposed to hardware rate limiters?
A3: I think one advantage of this work is to decouple the allocators of memory whether in software or hardware from the number of rate limiters you can support. So I think this work actually is a little bit different from trying to look at software and hardware. So I don't think software vs hardware is proper for this work.