TY - GEN
T1 - Isolating and Tolerating SDN Application Failures with LegoSDN
AU - Chandrasekaran, Balakrishnan
AU - Tschaen, Brendan
AU - Benson, Theophilus
PY - 2016
Y1 - 2016
N2 - Despite software-defined networking's proven benefits, there remains a significant reluctance in adopting it. Among the issues that hamper SDN's adoption, two issues stand out: reliability and fault tolerance. At the heart of these issues is a set of fate-sharing relationships: the first between the SDN control applications and controllers, wherein the crash of the former induces a crash of the latter, thereby affecting the controller's availability; and, the second between the SDN-Apps and the network, wherein the failure of the former violates network safety, e.g., network-loops, or network availability, e.g., black holes.In this paper, we argue for a redesign of the controller architecture centering around a set of abstractions to eliminate these fate-sharing relationships and thus improve the controller's availability. We present a prototype implementation of a framework, called LegoSDN, that embodies our abstractions, and we demonstrate the benefits of our abstractions by evaluating LegoSDN on an emulated network with five real SDN-Apps. Our evaluations show that LegoSDN can recover failed SDN-Apps 3x faster than controller reboots while simultaneously preventing policy violations.
AB - Despite software-defined networking's proven benefits, there remains a significant reluctance in adopting it. Among the issues that hamper SDN's adoption, two issues stand out: reliability and fault tolerance. At the heart of these issues is a set of fate-sharing relationships: the first between the SDN control applications and controllers, wherein the crash of the former induces a crash of the latter, thereby affecting the controller's availability; and, the second between the SDN-Apps and the network, wherein the failure of the former violates network safety, e.g., network-loops, or network availability, e.g., black holes.In this paper, we argue for a redesign of the controller architecture centering around a set of abstractions to eliminate these fate-sharing relationships and thus improve the controller's availability. We present a prototype implementation of a framework, called LegoSDN, that embodies our abstractions, and we demonstrate the benefits of our abstractions by evaluating LegoSDN on an emulated network with five real SDN-Apps. Our evaluations show that LegoSDN can recover failed SDN-Apps 3x faster than controller reboots while simultaneously preventing policy violations.
U2 - 10.1145/2890955.2890965
DO - 10.1145/2890955.2890965
M3 - Conference contribution
SN - 9781450342117
T3 - SOSR '16
BT - Proceedings of the Symposium on SDN Research
PB - Association for Computing Machinery
CY - New York, NY, USA
ER -