The authors present Edge Fabric, an SDN based system to remove the capacity constraints on peering paths at the points of presence (PoPs) from content providers. This is coupled with the performance and limitations of age old BGP protocol. They identify, despite having many changes in BGP policies over the years, it is still limited by capacity and performance for large volumes of consolidated traffic on the flattened topology. However, BGP is fundamental and not going to be replaced in near future. They overcome these limitations on BGP.
Three contributions in the paper:
1. The paper explicates the challenges in terms of network connectivity and traffic characteristics of popular applications providers to manage their egress traffic.
2. They propose optimized routing of egress traffic and discuss the experience over four years.
3. They instrument their system to measure performance along alternate paths—not just its best choice—to every prefix.
Facebook, to reduce the latency, deploys dozens of PoPs globally. The observe the relative egress traffic volume for several PoPs with respect to global load. They observed here that demand>capacity (50% of circuits demand 1.19 times capacity). They also observe number of BGP prefixes with respect to PoPs traffic.
The preferred routes are: private peers (tens)>IXP peers (Hundreds)>transits(two or more).
Design priorities: operation simplicity and ease of deployment: controller overrides BGP's decisions.
Approach: Router selects routes using BGP. Edge fabric selects ideal routes using BGP routs and alternate routes.
Two overrides: 1. move traffic for set of end-users, 2. move class of end-user traffic.
Key questions the authors aim to solve: does the system prevent circuit congestion and packet drops? Can we keep utilization at the threshold?
They use a combination of Host based routing and Edge based routing.
They compare with Espresso:
Edge Fabric: operation complexity, ease of deployment, mark packets priority
Espresso: maximum flexibility, cost savings, select packets route
1. What other kind of BGP limitations if want to redo all these changes?
2. U said congestion? What was the exact problem for congestion? Can't we just add the capacity?
3. How does the system has external control for traffic? as in per-PoP or per-user traffic?