Tuesday, August 18, 2015

Session 1, Paper 1: "BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing"

Authors: Alok Kumar (Google), Sushant Jain (Google), Uday Naik (Google), Anand Raghuraman (Premise Data Corp), Nikhil Kasinadhuni (Google), Enrique Cauich Zermeno (Google), C. Stephen Gunn (Google), Jing Ai (Google), Björn Carlin (Google), Mihai Amarandei-Stavila (Google), Mathieu Robin (Google), Aspi Siganporia (Google), Stephen Stuart (Google), Amin Vahdat (Google)

Presented by: Alok Kumar from Google

Paper: http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p1.pdf

Public Review: http://conferences.sigcomm.org/sigcomm/2015/pdf/reviews/175pr.pdf


The BwE system has been deployed inside Google for the last 5 years.
Google's internal network traffic is enormous: If Google would be considered an ISP, it would be the 2nd biggest one in the world!
B4 is Google's internals WAN. It spans multiple continents (North America, Europe, Asia)

Such a giant-scale network has many inefficiencies in bandwith allocation.
The goal of the project: a centralized bandwith allocation algorithm at the scale of Google's WAN
(allowing for flexible allocation polcies; enforcement at hosts), to minimize inefficiencies.

System before BwE: Thousands of competing users classified in few classes; no differentiation

Search vs Gmail, instead of user by user.
No good solution for non-critical application such as backups, where latency is irrelevant, but high throughput necessary.

What problem does it solve?
Visibility into users
Sharing of WAN bandwith based on configured policies -> users can specify/buy requirements

System Architecture: Global enforcer (takes policies and network model as input), computes allocation -> send to several cluster enforcers -> send to multiple job enforcers -> send to host enforcers

Policies have the following form:
Guranteed Bandwith (with weight) + Best Effort (with weight)
For example: Gmail: 10 Gbps guranteed + 20 Gbps best effort with w=2 and 50 Gbps with w=1


Algorithm: path selection (traffic engineering) & bandwith allocation are optimized independently, rather than doing joint optimization. This is suboptimal, but scales better.

Phases:
(1) Traffic Engineering (TE): run less frequently (so things scale): determines paths -> input to MPFA
(2) MultiPath Fair Allocation (MPFA): Can handle arbitrarily complex networks, flowgroups can take multiple paths, network can have bottlenecks

Failure Handling: Redundancy at each layer

Future Work: Deadline based scheduling, Joint BwE-TE optimization

Conclusion: BwE
* is single place for specifying bandwith polices
* enables efficient use of network resources


Questions/Answers:
[Unfortunately very incomplete due to acoustic problems]

Q1: Do applications need to specify requirements?
A: [had problems to understand acoustically] We have both; services buy bandwith

Q2: Do you over subscribe to get good utilization?
A: On certain levels

Q3: ...
A: Smooth polices over time (5 minutes)

Q4: Assumption that all hosts offer roughly the same amount of traffic; how much inbalance is there in practice?
A: ...

Q5: What about delay? Do you take it into account?
A: TE does somewhat (tries to assign shortest paths)


No comments:

Post a Comment