Tuesday, August 18, 2015

Poptrie: A Compressed Trie with Population Count for Fast and Scalable Software IP Routing Table Lookup

Authors: Hirochika Asai (The University of Tokyo), Yasuhiro Ohara (NTT Communications Corporation)

Paper (pdf)
Public review (pdf)

With the rapidly increasing size of routing tables, we need an efficient approach for IP routing table lookups. In this paper, the authors Poptrie, an IP routing lookup algorithm. The name was chosen because of how their algorithm leverages the popcount instruction and their use of the multiway trie. The authors were able to achieve a lookup rate of 148.8 Mlps (about 6.7 ns per lookup).

Poptrie uses a number of extensions to improve performance and reduce the memory footprint, such as compressing the leaf bit-vector, route aggregation,  and direct pointing (like DXR). Using the leafvec extension, they were able to reduce the memory footprint to less than 1/3. Adding the route aggregation reduces the tree depth. Direct pointing extracts the nodes that corresponds to the most significant s bits, which specifies how many bits should be used as the index to the lookup array.

The authors tested Poptrie on both random and real traffic and compared against other algorithms. In their tests, D18R performed well on random traffic, but did poorly on on real traffic. Conversely, SAIL did poorly on random traffic, but performs well on real traffic. In contrast to both, Poptrie performed well on both random and real traffic patterns. The authors also pointed out that Poptrie is well suited for the future as the number of IPv6 prefixes increases, since it does well on datasets with longer prefixes.

Q&A

Q: Compared to work in the past, the most impressive thing here seems to be that you managed to compress the intermediate nodes and pointers. It seems like the big win is the deletion of all the duplicate space. Yes?
A: Yes. The direct pointing is significant for this. We couldn't see any big differences at shorter prefixes, but we could see very big differences at the deeper prefixes.

Q: Are there any other places such as firewall rules or encryption lookups that would also benefit that would benefit from these kind techniques for deletion of duplication.
A: Yes, I think so. Right now, we are focusing on IP routing table, but could extend this in future work to apply to access control lists, firewalls or string matching.

Q: You're using the pop count instruction in a general purpose processor, it's a standard instruction. You're benefiting a lot from that instruction. Normally, if you need high speed lookups you'd probably use ASICs or FPGAs perhaps. So, couldn't you put more things in the hardware?
A: Yes, this can implemented in hardware, but we focus on the software. Because with the advancement of hypervisors and virtual machine technologies, routing is often implemented in a hypervisor. In that case, we are not using dedicated hardware. This mechanism could be implemented in hardware but our focus is on software.

Q: I wonder if you had tried using memory prefetching to reduce the number of cycles for lookups? It should be able to reduce the number of cycles that are caused by cache misses.
A: The cache misses are the main factor that affects performance. Comparing 5th and 95th percentiles, we can see there are big fluctuations. It's very fast if there are no cache misses, but if there are cache misses we see bad performance. We looked into trying to quantify the number of CPU cache misses, but the kind of CPU we used does not report that. We want to look into that in the future. So instead we focused on measuring CPU cycles. We did try prefetching, but it doesn't help. We only have 6-7 nanoseconds. Since the prefetching requires more CPU cycles and time, it does not work well.