Tuesday, October 28, 2014

A Highly Available Software Defined Fabric

Presenter: Aditya Akella

Authors: Aditya Akella (University of Wisconsin-Madison), Arvind Krishnamurthy (University of Washington)

This work is about providing high availability in SDNs. Aditya starts by arguing that SDNs today can not guarantee high availability. The reasons include 1) distributed consensus protocols that run in isolation from the data network, controller-controller and controller-switch mechanisms. He then discusses two strawman solutions for guaranteeing high availability with a running example. The Strawman1 consists of reliable flooding + controller replication. And the Strawman2 consists of partitioned consensus with reliable flooding.  Both these strawman cannot guarantee high availability.

He then proposes a solution consisting of Strawman2 and a global distributed snapshot protocol. The distributed snapshot protocol tries to build tight consensus while ensuring consistency.

Aditya, discusses that this is just one solution, and they haven't explore the tradeoffs associated with it. There could be tradeoffs in terms of performance and complexity?

Q: The mechanisms(reliable flooding, global distributed snapshot + partitioned consensus) that you are proposing they seem to be sufficient but are they necessary? 
A:  This is one possible solution and we haven't explored the tradeoffs associated with this solution. There may be other proposals which could provide better performance and complexity tradeoffs

Q; You have considered that your network is implementing end-end routing, what about other management applications? Would you require mechanisms to ensure high availability?
A: Yes its possible that these applications may require different mechanisms to provide high availability.