Tuesday, August 18, 2015

Session 1, Paper 1: "BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing"

Authors: Alok Kumar (Google), Sushant Jain (Google), Uday Naik (Google), Anand Raghuraman (Premise Data Corp), Nikhil Kasinadhuni (Google), Enrique Cauich Zermeno (Google), C. Stephen Gunn (Google), Jing Ai (Google), Bj√∂rn Carlin (Google), Mihai Amarandei-Stavila (Google), Mathieu Robin (Google), Aspi Siganporia (Google), Stephen Stuart (Google), Amin Vahdat (Google)

Presented by: Alok Kumar from Google

Paper: http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p1.pdf

Public Review: http://conferences.sigcomm.org/sigcomm/2015/pdf/reviews/175pr.pdf


The BwE system has been deployed inside Google for the last 5 years.
Google's internal network traffic is enormous: If Google would be considered an ISP, it would be the 2nd biggest one in the world!
B4 is Google's internals WAN. It spans multiple continents (North America, Europe, Asia)

Such a giant-scale network has many inefficiencies in bandwith allocation.
The goal of the project: a centralized bandwith allocation algorithm at the scale of Google's WAN
(allowing for flexible allocation polcies; enforcement at hosts), to minimize inefficiencies.

System before BwE: Thousands of competing users classified in few classes; no differentiation

Search vs Gmail, instead of user by user.
No good solution for non-critical application such as backups, where latency is irrelevant, but high throughput necessary.

What problem does it solve?
Visibility into users
Sharing of WAN bandwith based on configured policies -> users can specify/buy requirements

System Architecture: Global enforcer (takes policies and network model as input), computes allocation -> send to several cluster enforcers -> send to multiple job enforcers -> send to host enforcers

Policies have the following form:
Guranteed Bandwith (with weight) + Best Effort (with weight)
For example: Gmail: 10 Gbps guranteed + 20 Gbps best effort with w=2 and 50 Gbps with w=1


Algorithm: path selection (traffic engineering) & bandwith allocation are optimized independently, rather than doing joint optimization. This is suboptimal, but scales better.

Phases:
(1) Traffic Engineering (TE): run less frequently (so things scale): determines paths -> input to MPFA
(2) MultiPath Fair Allocation (MPFA): Can handle arbitrarily complex networks, flowgroups can take multiple paths, network can have bottlenecks

Failure Handling: Redundancy at each layer

Future Work: Deadline based scheduling, Joint BwE-TE optimization

Conclusion: BwE
* is single place for specifying bandwith polices
* enables efficient use of network resources


Questions/Answers:
[Unfortunately very incomplete due to acoustic problems]

Q1: Do applications need to specify requirements?
A: [had problems to understand acoustically] We have both; services buy bandwith

Q2: Do you over subscribe to get good utilization?
A: On certain levels

Q3: ...
A: Smooth polices over time (5 minutes)

Q4: Assumption that all hosts offer roughly the same amount of traffic; how much inbalance is there in practice?
A: ...

Q5: What about delay? Do you take it into account?
A: TE does somewhat (tries to assign shortest paths)