Tuesday, August 18, 2015

Session 3.1 - Paper 1: Inside the Social Network's (Datacenter) Network

Inside the Social Network's (Datacenter) Network


Authors: Arjun Roy (University of California, San Diego), Hongyi Zeng (Facebook), Jasmeet Bagga (Facebook), George M. Porter (University of California, San Diego), Alex C. Snoeren (University of California, San Diego)

Presenter: Arjun Roy


Summary:

This paper presents the production traffic workload characteristics of the Facebook datacenters and how the workload differs from the existing literature. Existing workload observations are mainly based on web search services, which does not capture all types of cloud services. Literature of datacenter traffic shows that: traffic is rack local and predictable over small timescales. The ability to design efficient deployments in datacenters by means of traffic workload knowledge motivates this work. Facebook datacenter architecture consists of multiple site, multiple datacenter within site and a fat tree topology consisting of clusters of servers connected together by top-of-rack switch, in turn connected to cluster switches, which are connected by Fat Cats aggregation switches. Servers have exclusive roles: cache, web, Hadoop and MultiFeed servers; rack typically contains same type of servers. Workloads are collected using Facebook-wide monitoring system using Fbflow(long term traffic storage, but of low resolution) and per-host packet-header traces using Port mirroring(short term traffic of higher resolution). 

Following are the observations from traffic analysis:
  1. Although Hadoop deployments traffic is consistent with literature, other traffic is neither rack-local nor all-to-all(sparse rack locality and heavy inter-rack traffic observed in Front-end clusters).
  2. Despite this inter-rack traffic, most of the links are loaded with less than 10% traffic. Heavy hitters are generally instantaneous and are not persistent. Sub-second traffic is unpredictable and ephemeral.
  3. Traffic is bipartite due to colocation of servers of same type. 
  4. Traffic is stable across all cache servers over long timescales(1-second timescale). Traffic distributions exhibit even spread across different levels(cluster, dataceenter, rack levels and so on). Hence traffic is stable over time and per destination.
These observations provide a chance to reduce inter-rack bandwidth.

Q&A:

Q: Any insights on why the heavy hitters didn't last for long, is it due congestion or some application the end hosts are doing?
A: Heavy hitters are ephemeral due to effect of load balancing. If heavy hitters are persistent, then that means the load balancer is not doing its job. Load balancer ensures that the cache center(which got heavy hit) does not get further requests for a significant amount of time to balance load properly.

Q: Was coordination between same type of service exhibiting locality similar to intra-rack locality?
A: We did look at locality. Since services are colocated at rack level, looking at rack level is exactly same as looking at type of service. 

Q: Instead of blind colocation of servers of same type, why not mix servers together?
A: Locality patterns wouldn't work out well by doing colocation of different servers.

Q: Are aggregation and core switches as volatile with respect to heavy hitters as end hosts are?
A: This study did not cover this.

Q: Are incast traffic issues observed in the FB datacenters similar to observations in DCTCP?
A: Incast traffic require micro-second scale observations, which weren't part of this study.