Thursday, August 24, 2017

Session 7, paper 3: A Tale of Two Topologies: Exploring Convertible Data Center Network Architectures with Flat-tree

Summary

Convertible data center networks, a concept introduced in this paper, are able to adjust their topology in software. There are three motivations for convertible networks:
  1. There is a tension between easy implementation and good performance. For example, Clos topologies are easy to implement and manage (scalable, modular, central wiring), but for certain traffic patterns a random network will perform better (it has low average path length, rich bandwidth compared to other topologies with the same number of switches and links).
  2. New data center architecture proposals have tapered off. However, one-size-fits-all-workloads topologies don't really exist. 
  3. The technology for configurable data center networks exists (Helios, c-Through, Quartz, etc.), but it isn't scalable yet as it relies on centralized devices. 
The paper proposes flat-tree, a network design which can convert between Clos and random graphs for subsets of the network by flattening the Clos tree structure at different scales. They accomplish this by adding wiring within pods, by wiring pods to core switches using a special pattern, and by connecting pods to adjacent pods. The design relies on low-cost converter switches, which connect all tiers of the data center in reconfigurable ways. Scalability is  The control plane consists of k-shortest-paths routing and MPTCP; they develop a custom addressing scheme to reduce path explosion induced by k-shortest-path routing.

They evaluate their work with both simulations and experiments in a testbed. The simulations use real Facebook data center traffic traces; the traces vary in locality. As the locality changes, different topologies perform better, showing that convertibility is important. Their testbed implementation shows improved bandwidth and lower data read delay than a static Clos network. 


Q&A

Question: Work from several decades ago looked at multi-stage networks (2x2 network switches) for HPC -- what can we learn from that work? 
Answer: We can leverage their topologies with their nice properties for data centers, but the data center workloads and workload dynamics are very different (fast-changing for data centers, slow changing for HPC networks).

Question: Re. the practicality of building converter switches, how do you build a converter switch that can convert among parallel fiber? 
Answer: converter switches are "future-proof"; nothing prevents them from working in that context.

Question: You can reconfigure the network to approximate three different topology modes; have you thought about what you can do to match other topologies better?
Answer: This is a prototype; future work can and should explore how to match the network flexibility to the workloads with finer granularity. 

Question: Failure analysis, e.g., if a link goes down? E.g., convert to semi-random graph--how many ECMP paths you can get? 
Answer: This is orthogonal to our design, as our work doesn't focus on failure recovery. Our random-graph-configured zones have all the same properties of random graphs, same for Clos. 

Question: At reconfiguration time, what happens to routing? Is network operational during this period?
Answer: Topology won't change very often, only when workload changes. The changes will affect only a small proportion of the network at a time.