Wednesday, April 3, 2013

SDN confusion

I found the Monsanto et al. paper (Composing Software Defined Networks) fascinating, in part because it forced me to continue confronting how little I "get" about software-defined networking. I'm grateful to Aditya Akella for his very readable public summary for non-experts -- this is a great innovation that I hope NSDI continues and other conferences adopt.

Two things about this paper stuck out for me:
  1. The suggestion that the authors have outgrown the OpenFlow interface, and in order to implement Pyretic they needed a controller-maintained mapping layer on top of OpenFlow:
    Ideally, OpenFlow switches would support our extended packet model directly, but they do not. Our run-time system is responsible for bridging the gap between the abstract model and the OpenFlow-supported packets that traverse the network. It does so by generating a unique identifier that corresponds to a unique set of non-OpenFlow-compliant portions of the packet (i.e., all virtual fields and everything but the top of the stack in an OpenFlow-compliant field). This identifier is stored in spare bits in the packet. [Any source of spare bits (e.g., MPLS labels) could be used. Our current implementation uses the VLAN field.] Our run-time system manages a table that stores the mapping between unique ids and extended data.
    To this non-expert, that raises a bunch of questions. To what extent does OpenFlow draw the boundary between hardware- and software-defined networking in the right place, to enable this kind of high-end research into elegant abstractions? If every configuration of virtual fields is a totally separate flow as far as the switch is concerned, what are the performance implications of requiring the switch to query the controller and maintain an entry for each separate "flow"? The VLAN field is only 12 bits -- surely this constrains the complexity of the flows or the rules that can exist on the network? The state could be enlarged another 20 bits by adding in the MPLS label field, but is it realistic to have 2^32 flow table entries in the switch anyway?

  2. The paper highlights an "interesting twist" (sec. 5.2) that "the correct processing order of load balancer and firewall turns out to be direction-dependent." In other words, incoming packets need to be firewalled and then load-balanced, while packets headed in the opposite direction need the same steps to be un-applied in the opposite order. The example application solves this with a conditional check:
    fwlb = if_(from_client, afw >> alb, alb >> afw) 
    But naively, shouldn't it be "obvious" that if we compose a load-balanced service inside a firewalled subnetwork, the order that transformations are applied for incoming packets must be reversed when those same transformations are removed from outgoing packets? This is the case any time we invert the composition of two functions! Given that the authors here are proposing a new and powerful system of high-level abstractions, why should the programmer have to worry about this detail explicitly?

    I think the issue may be that this paper is not about the composability of networks (whose gateways may apply transformations and enforce rules), but about the composability of packet-processing policies. But I am not, frankly, very sure I really got it. There may be considerable subtleties abound as the SDN folks continue searching for the right abstractions to elegantly control the behavior of large networks.

5 comments:

  1. Hi, I'm one of the authors of the authors of the paper, and have a comment on your question about how the OpenFlow switches do not support the full range of (virtual) header fields in Pyretic's abstract packet model. A "southbound" API like OpenFlow must be designed (at least in part) for hardware-efficient implementation, whereas a "northbound" API like the Pyretic language can be designed for higher-level abstractions. So, it is natural that OpenFlow won't natively support all of the abstractions in Pyretic. Fortunately, a run-time system can map the virtual header fields to a common space of real header fields (like VLAN tags or MPLS labels) "under the hood", without exposing the tedious book-keeping to the programmer. That's not to say that some changes to OpenFlow wouldn't be helpful. But, even if OpenFlow went further in supporting our abstractions, we're likely to continue to see a gap between the northbound and southbound APIs, due to their different design goals.

    Jen Rexford

    ReplyDelete
  2. Thank you very much for your reply! My confusion is this: if the full set of the virtual header fields is mapped by a lookup table to a 12-bit header field as the paper describes, how likely would a set of policies be to fill up that 12-bit space when real flows transit the network?

    Does this just depend on the complexity of the overall virtual topology at startup, or could an ensemble of evil flows also cause the space to be exhausted over time (or even if not exhausted, for the switch to require an ungainly number of flow-table entries)?

    For example, let's say that one of the composed components is a network-address translator, and one of the virtual header fields represents a UDP port number. Is the whole network now limited to 2^12 NAT forwarding entries? I assume the answer is no, but I don't understand why.

    ReplyDelete
    Replies
    1. Hi Keith, Josh Reich, another of the authors here. I'll try to clear up some of the confusion (we can iterate until it's all clear :-) Conceptually, Pyretic can offer a very, very large number of virtual field values. However, to do this, (using the best strategy we've come up with) we need to send almost all packets containing such values through the controller (if its unclear how this might work, let me know). Clearly, this will be unacceptably inefficiently. So you observation is correct - Pyretic can only efficiently support a limited number of extended field values.

      The precise number Pyretic can support depends on three things:
      1) The hardware available and the program being run on the network - both of which determine how many spare hardware-matchable header bits will be available.
      2) What type of optimizations we might apply (e.g., we may be able to use the same VLAN tag to represent a different extended value on each different link)
      3) If there are extended values that are rarely (or never) sent across a link, we can handle these in the controller, leaving VLAN tags to be used only for extended values used frequently.

      All that said, in your particular example, we'd actually be fine since OpenFlow provided support for matching on UDP/TCP ports and thus there's no need for extended field values.

      -Josh

      Delete
    2. I agree with Josh's points, and would just add that we can reuse the VLAN/MPLS encodings at different hops. For many virtual header fields, we don't need their physical representations to be globally unique. This is similar to how MPLS can use the full range of MPLS label values at each hop, since MPLS routers can do label swapping.

      In any case, all of this argues for having a run-time system take care of these gory details, since they are tedious and important (and hard) to get right. Note also that, in some cases, the virtual header fields don't even need to be carried in the physical network (e.g., if they involve hops between virtual switches that map to the same physical switch).

      -- Jen

      Delete
  3. Hi Keith, Josh again. I wanted to respond to your well-made point (2) "But naively, shouldn't it be "obvious" that if we compose a load-balanced service inside a firewalled subnetwork, the order that transformations are applied for incoming packets must be reversed when those same transformations are removed from outgoing packets? This is the case any time we invert the composition of two functions! Given that the authors here are proposing a new and powerful system of high-level abstractions, why should the programmer have to worry about this detail explicitly?".

    So, in many circumstances you are dead right - it's fairly obvious as to what should happen and this isn't something the programmer should have to worry about (or at least not much). As you've observed there's actually a fairly elegant way of approaching problems like this by working with the inverses of policy functions. See here for a draft which neither fit in, nor was quite ready for the NSDI paper: http://www.cs.princeton.edu/~jreich/drafts/reich_SDN_policy_inversion.pdf

    That said, as you'll see in that writeup, there are circumstances where it isn't completely clear what the programmer really wants. When we invert - should we only reverse the topmost policies? If so, what happens if we have a nested sequential composition? If not, where does it stop? We might hit a policy that it doesn't make sense to invert. For example consider a stateful firewall that allows all traffic going in one direction, but only return traffic from hosts to which some outgoing traffic was recently sent. If we invert that policy we will end up with a broken firewall that lets traffic in from both sides! Not good. So the programmer does need some tools to both specify when policies should be reversed/inverted and which policies should be protected from such operations. (The writeup should be a bit clearer than this quick note).

    -Josh

    ReplyDelete