Tuesday, August 13, 2013

SIGCOMM2013: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

Authors: Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, Mark Horowitz
Presenter: Pat Bosshart

Background and motivation:

Fixed function switches allows multiple levels of matches, but they are fixed and not flexible, say for example in terms of memory requirements.

Key problem: Lack of flexibility in switches. The rise of SDNs further accelerates this requirement on flexibility. In particular, they require
- flexible actions
- flexible header fields
- multiple stages of match action

Existing solutions for flexibility:
- Softwares
- Network Processors
But all these approaches are slow and expensive!

Goals of the paper:

  • How to design flexible switch chip ? Provide an architecture for RMTs (Reconfigurable Match Table)
  • Establish usability and use cases.
  • Provide estimates for the feasibility for flexible switch chips, eg., cost

Using a parse graph allows arbitrary fields, and can be used with programmable parser. This means that instead of new hardware, the parser could be easily changed.
- Re-configurable table graphs.
- sometimes changes might be needed to both parse and table graphs.
The implementation consists of a physical pipeline which maps the logical stages. However, these flexible chips need general purpose CPUs. The authors solve the memory-to-CPU bottleneck by
- replicating CPUs
- having more stages
This goes with the reasoning that a higher cost is okay!

Switch design:

64*10GB Switches
Programmable paresr
1GHz pieline
32 match/action stages
Huge TCAM- 64K Tcam words
224 action processors per stage

In comparison  with conventional switches, the authors make the following observations:
- Many functions are similar to conventional switches eg., io, buffer, CPU are the same
- The extra functions can be made optional : e.g., statistics

Cost comparison with fixed function switches:
- Cost of memory is more ( 8% extra)
- Total cost 14% more

- Use RMT switch model to make flexible chips
- Bring processing to wires (more CPUs per stage)
- Pipelining - brings processing close to memories
- Cost of 15% more is low, given the benefits offered in terms of the higher flexibility

Q: What next?
A: Research project at TI. Not sure about when the product is likely to be released.

Q: Most multi-processors have general purpose chips, how does it compares to this solution model?
A: Very different as each ALU is processing only one per word - this is much simpler since the requirement is to perform "match-actions".

Q: Is there a software for the compiler?
A: Not available right now, could be an interesting future work.

Q: Use case that has to modify chip size?
A: The speaker was away from the microphone, so couldn't quite hear this properly.