Friday, April 5, 2013

A Report on NSDI'13: Wire Speed Name Lookup: A GPU-based Approach

This is a summary of the presentation by Bin Liu and the question and answer section that followed. If I have misattributed a question, let me know in the comments and I will fix.

Summary

Name lookup is an important problem, both in the context of networking and in domains outside of networking. This paper focuses on a particular use case of name lookup: routing in a content-centric networking.  Content-centric networking uses hierarchical names (e.g., /com/google/maps) of arbitrary length rather than fixed-length addresses to route packets. In order to be useful, content-centric networking needs a lookup mechanism that supports longest prefix matching, has high throughput (saturate a 100 Gbps Ethernet link), and low latency (100 microseconds per lookup).

Satisfying these constraints requires a carefully designed data structure that can work efficiently with GPU hardware. Some straightforward approaches won't work: character tries and state transition tables require too much memory.  Aligned transition arrays (ATAs) can greatly compact the information of a transition table, but they don't support incremental update and are still inefficient (each character must be looked up separately). To address both of these concerns, the authors implement multi-ATAs, which can use multiple characters in each transition and support incremental update.

There are some challenges to implementing multi-ATAs on a hybrid CPU-GPU platform: the PCI-e bus and GPU processor are limiting factors in achieving both high-throughput and low latency. Two techniques are used to improve performance. First, pipelining lookup and data transfer improves PCI-e bus and GPU processor utilization. Second, interweaving the memory layout of the input names in GPU memory reduces the memory accesses of threads, improving performance.

The implementation of the multi-ATA data structure performs well on the CPU-GPU platform. Using the multi-ATA requires two orders of magnitude less space than a baseline state transition table. The techniques to efficiently use the GPU allow up to ~70 million searches per second, an order of magnitude more than a state transition table. Furthermore, these techniques can saturate a 100 Gbps link with latencies of less than 100 microseconds per search.

Question and answer

Dong Zhou, Carnegie Mellon University

  DZ: In your evaluation, you worked only on a local machine, but you didn't actually transfer data onto a NIC. Is this a fair comparison with other systems?

  BL: Yes, we only work within the context of a single machine. We're looking at using the NIC in future work.

  DZ: Using the NIC will take additional CPU cycles, will the competing CPU used affect your results?

  BL: This is something we need to address, but we think that other hardware acceleration could help.

Srimat Chakradhar, NEC Labs

  SC: Right now you are keeping the entire state table on the GPU, will this be possible as state tables get larger?

  BL: Currently, we have a 10 million entry table on the GPU, which uses about 1/3 of the GPU’s external memory for a single GPU processor chip, actually we have two processor chips on the GTX590 board. We estimate that we could keep a 60 million entry table on this kind of GPU. Newer GPUs have more space, and we think
  that we could keep a 120 million entry table on the new hardware.

Michael Freedman, Princeton University

  MF: In the multi-ATA table it looks like collisions map into a second table. This seems to imply that lookups will require multiple accesses, is that the case?

  BL: No, we have a proof in the paper.

  MF: Did you consider comparing with other hashing algorithms (e.g., cuckoo hashing)?

  BL: No, this is a much more simple lookup algorithm

Gun Sirer, Cornell Univerity

  GS: It seems like I should be mining BitCoins with my GPUs. Right now, this CCN routing lacks security (e.g., self-verifying names). Lack of security as a first-class primitive plagues DNS today. How will your system address security challenges?

  BL: We haven't considered that specifically, but it's definitely the future work.