Wednesday, August 23, 2017

Session 3, Paper 4: Constant Time Updates in Hierarchical Heavy Hitters (Basat, Einziger, Friedman, Luizelli, Waisbard)

DDoS are increasing, so an efficient way to identify and act upon them is needed. Previous work, namely Hierarchical Heavy Hitters (HHH), propose algorithms whose update complexity increases with the hierarchy’s size. Specifically, for each packet, the approach is to look at the source IP and compute all IP prefixes. Such a solution is not ideal as hierarchy size will keep on increasing with the adoption of IPv6. Other emerging trends such as NFV further motivate the need for fast software
based measurement algorithms.

In light of this, Basat proposes to compute a random prefix only. This reduces the run-time since you only count 1 prefix, regardless of the hierarchy size. An additional speedup is gained via sampling. For instance, each incoming packet is ignored with some probability, and the rest of the packets proceed to have their prefixes computed. The idea is that if enough packets are seen from attacking networks, they will be sampled enough times for accurate detection.

The system was implemented in Open vSwitch (OVS) and the results showed that after enough packets there are no false negatives, no counting errors, and only a few false positives.

Accuracy improves with the number of packets. Converges is reached after ~32M packets. If sampling is used, convergence is reach in ~128M packets. After this many packets, the proposed solution performs on par with previous approaches in terms of accuracy. However, Basat's solution achieves up to 62 times speedup. Their performance is close to the vanilla OVS and provides a 250% throughput improvement compared to other approaches.

Q: How do you do randomization between different hierarchies? Is uniform at random the best option?
A: We need the same accuracy for every hierarchy. We thought about variant randomization, but could not justify pursuing this direction is we could not find any data found to convince us that this is needed. Perhaps this is a good direction for future work.

Q: What happens if you add sampling to previously proposed algorithms? Does the throughput performance improve?
A: It is true that most of the throughput improvement is achieved thanks to sampling. However, previous approaches also suffer from non-constant worst-case update times.