Tuesday, October 28, 2014

Tolerating SDN Application Failures with LegoSDN

Authors: Balakrishnan Chandrasekaran, Theophilus Benson (Duke University)

Bugs in software is endemic. In SDN applications,  bugs cause lots of issues (e.g. cascading crashes caused by an application), hindering adoption of SDN. The paper explores the problem of tolerating SDN application failures, leading to rethinking / reframing of SDN architecture. The position the paper takes is that one should not sacrifice availability, SDN networks should be resilient to application failures. Specifically, the paper presents LegoSDN tool, providing abstractions of isolating SDN-apps from the controller and isolating SDN-apps from the network.

Q (Bruce Davie from VMWare): More examples tolerated by LegoSDN?
A: Crash of application in LegoSDN (which is built on top of Floodlight) does not cascade.

Q: What about applications that share state?
A: LegoSDN handles that. LegoSDN applications still run in separate containers.

Q (Ying Zhang from Ericsson): SDN app store. tackling availability. What about malicious apps?
A: Focus of LegoSDN now is on "crash". But one may build a malicious-apps-solution on top of LegoSDN.

Q: Solution not restricted to SDN, will LegoSDN generalizes to other settings?
A: The authors are interested in looking for other domains.

Q ( from USC): Is the state shared on controller itself, or state shared in whole network. Will LegoSDN handles each case?
A: LegoSDN considers the latter.

Q: Rollback requires domain specific knowledge
A: LegoSDN abstraction provides a way to do rollback with domain knowledge.

Q: On rollback, issue of rollbacking one app that affects others?
A: Still working towards a solution.