Updated on 2013-04-10. This is a report of the presentation done by Ahmed Khurshid on 2013-04-03. Paper co-authors are Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey, University of Illinois at Urbana-Champaign.
Tons of devices, running different protocols, from various types and vendors. Intricate relationship between devices, with several operators configuring them. This is just a glimpse on the existing challenges that Ahmed Khurshid enumerated for networking debugging. These challenges make it difficult for one to test every possible scenario. As a consequence, bugs may go hidden and affect production networks in a variety of ways (e.g., degrading their performance and/or making them more prone to attacks).
Ahmed enumerated some existing network debugging techniques, e.g., traffic flow monitoring and configuration verification. With regard to configuration verification, he emphasized a crucial problem: the only input taken is configuration scripts. Everything else (control-plane state, data-plane state, and network behavior) is predicted. To bridge this gap, Ahmed introduced a novel approach: data-plane verification. It considers data-plane state as input for the verification process, thus making it less dependent on predictions, and closer to actual network behavior. He also emphasized on running this verification task in real time (in contrast to existing approaches such as FlowChecker, Anteater, and Header Space Analysis), as the network evolves.
This approach brings us to VeriFlow, a tool for checking network-wide invariants in real time. VeriFlow introduces a layer between the controller and devices in a Software-Defined Network (SDN), thus enabling the verification of rules before they are deployed in the network. In summary, VeriFlow operates by intercepting new rules, and building the equivalence classes associated to them -- a set of packets that are affected by the rule. For each equivalence class computed, VeriFlow generates individual forwarding graphs, which model the forwarding behavior of the packets in the equivalence class through the network. Finally, VeriFlow runs custom queries to find problems. VeriFlow has a set of built-in queries to verify certain invariants. However, it also exposes an API for writing custom invariant checkers. During his talk, Ahmed discussed extensively on the computation of equivalence classes (which uses multi-dimensional prefix tries), always highlighting the aspects that make it an efficient and quick process. From the set of experiments carried out, one of the main take-away was that VeriFlow checked 97.8% of the updates from a real BGP trace within 1 millisecond. Some updates took longer, however, because of the larger number of equivalence classes affected by new rules.
The question & answer session of this paper had some quite good activity! Sanjay Rao (Purdue University) asked about the coverage of error detection -- the types of errors VeriFlow cannot detect. Ahmed argued that if the problem is visible at the data plane, it can be detected. Rao continued by asking about errors that span to multiple devices. Ahmed replied that yes, these type of errors can be captured as well, since VeriFlow has a global view of the network. Yan Chen (Northwestern University) asked if VeriFlow is applicable to inter-domain routing. Ahmed replied that yes, VeriFlow can be used in this context as long as the controller is able to receive reports about changes. He also mentioned that the accuracy of the detection depends on receiving updates in the network change in real time. Masoud Moshref (University of Southern California) asked whether one can reorder rules in order to improve VeriFlow's performance. Ahmed answered they have not looked at this option yet, but they wish to investigate it. He also said that as VeriFlow targets real-time processing of the stream of rules coming from the controller, it may not have the liberty to reorder those. Takeru Inoue (Japan Science and Technology Agency) asked about VeriFlow's memory requirements for verification. Ahmed replied it is expensive; for a BGP experiment shown in the evaluation section, VeriFlow took 6 GB of memory.