Wednesday, August 24, 2016

SIGCOMM 2016 - Session 6 (Verification) - Fast Control Plane Analysis Using an Abstract Representation


Aaron Gember-Jacobson -- University of Wisconsin, Madison
Rajaay Viswanathan --  University of Wisconsin, Madison
Ratul Mahajan -- Microsoft Research

Presenter - Raajay Viswanathan

The paper focuses on performing fast analysis for legacy/traditional control planes that are still used in many networks. Configuring network devices is a hard job and configuration errors are common. Operators need to deal with multiple routing protocols, routing process priorities, route exchanges, traffic selectivity etc. Configuration errors lead to policy violations, that in turn could end up exposing security vulnerabilities. More often than not, misconfigurations lead to incorrect forwarding behavior.

To add to this, some violations may show up only under failure. Rajaay goes on to illustrate this with a 3-router example with OSPF that demonstrates some ACP bypass scenarios to state this point. There is also listed a case of how a Microsoft misconfig impacted Azure a great deal. Raajay also presents the state of the art approaches such as Header Space Analysis (HSA), Veriflow and Batfish and the problems that plague them. HSA and Veriflow verify data plane consistency, whereas Batfish is time consuming in its efforts to simulate low level protocol messages.

The challenge that exists is how to speed-up verification under failure. The authors accomplish this by graph analysis and a collation of digraphs that they build from the network configuration. These graphs encode the network's forwarding behavior.

Abstract Representations of the Control Plane or ARCs are the digraphs that are modeled and represent the control plane. The requirements for creating ARCs and the actual process of creating them were presented with explanations as to what the vertices and edges are. There is one digraph per src-dat per subnet pair with the vertices being the hosts and routing processes. The edges are inter and intra device links. The edge weights are normalized using a scaling technique such that the shortest path in the graph denotes the shortest path in the network between two points.

It was demonstrated that ARCs can be used in a variety of ways. 3 main cases presented were

  1. Always blocked
  2. k-reachability (how many failures will it take to break the network)
  3. Path-equivalency
The evaluation was performed with respect to these three cases and it was fund that most of the time consumed was in parsing the network configurations. The times ranged from less than 500 ms for always blocked to less than a second for k-reachability. It was argued that path equivalency took more time due to having to compute edge weights and rescale for this case (almost up to a 100s), but those number pale in comparison to what Batfish takes right now (numbers were extrapolated for 3-link failures).

Raajay also says that 96% of networks fall into a high-fidelity category for ARC creation. He also postulates that they cannot prove path-equivalence for the remaining 4% (due to inability to generate edge weights), but blockability and reachability can still be verified.

Q : What would happen to a network with a lot of filters (where it could lead to explosion) ?
A : The running time of ARCs depends on the number of traffic classes you have, if you have many route filters, you will have generate a ton of traffic classes. But they did not observe that many filters in the configs that they evaluated.

Q : Is there enough information to map or tie k-reachability failures back to the network?
A : Yes, since the digraphs mirror the network in its entirety and a shortest path in the graph is equal to the shortest path in the network, a reachability failure can be tied back to the network.