Wednesday, August 19, 2015

Session 4 (Middleboxes) -- Paper 2: BlindBox: Deep Packet Inspection over Encrypted Traffic

Authors: Justine Sherry (UC Berkeley), Chang Lan (UC Berkeley), Raluca Ada Popa (ETH and UC Berkeley), Sylvia Ratnasamy (UC Berkeley)
Presenter: Justine Sherry 


Public Review:

Middleboxes are used to do deep packet inspection, and scan and filter traffic. However, today many of network communications are encrypted, which means middleboxes won't be able to do their normal DPI. This work presents BlindBox which achieves DPI for encrypted traffic.

The speaker pointed out that they are mostly focused on intrusion prevention systems, and motivated the problem by pointing out that such systems are responsible for monitoring connections and deciding which ones should be blocked, which is an important functionality for both clients and networks, but conflicting with encryption. So the goal is to get the combination of privacy (encryption) and functionality (middleboxes): DPI without decryption.

The speaker then discusses the thread model and explains that one solution is for vendors to basically release their detection lists, but that's not something they are willing to do because that basically their secret sauce and what makes them better than their competitors.

An overview of BlindBox HTTPS is then presented, starting with the handshake and how it's basically base on known techniques. Encrypted data goes through BlindBox and the middlebox learns encrypted keywords. One downside of this is the threat of middlebox doing frequency analysis attacks. To overcome this, they use the literature on searchable encryption approaches. The speakers talks about how randomized searchable detections can help with this problem, but it comes at the cost of lower speed as opposed to deterministic approaches. To get the best of the two worlds, every token is encrypted and then used as the salt for next encryption.

The speaker mentions that the talk focused on exact matches and encourages the audience to read the paper for details on more complex matching such as regular expressions.

The talk was concluded with evaluation results on two vectors: functionality and performance. In terms of functionality the can support 100% of things that middleboxes do for exact matching. Fo more complex matchings such as regex, they have to use a weaker security model if they wish to achieve that 100%. In terms of performance it's comparable with current IDSs employments, but there is a setup time that it's reasonable if connections are long-lived and persistent.

Q: I don’t believe IDSs work! And also usually what they do is not just substring matches. What we really care about malware. How do you apply this in an environment where malware in polymorphic and encrypts itself? I am not convince this is going to work! 
A: I actually don't agree with that. IDSs are used all the time
Q: Yes, but they don't work! it's like 99% FP! (Let me take a look at the paper)

Q: Having the marker at different locations in the tree doesn't really prevent  frequency attack?

A: Yeah, once there is actually a match, we want the middlebox to learn that! So of course you can do frequency analysis over match data. But not over innocent data

Q: If alice is attacker, why would she talk with the middlebox?
A: Bob is going to determine what his security level is.

Q: About the performance: ~190Mbits per second. That’s actually orders of magnitudes slower than commercial boxes
A: thats single core
Q: Oh ok. So what’s the bottleneck in the performance?
A: The hand shake! That’s what killing us and needs to be revisited and improved.

Q: What o you do about behavioral IDSs?
A: usually IDSs do exact matches and if there is a match, they do regex and more. Once you have an exact match, you have the ability (probable cause) to decrypt

Q: What prevents middleboxes to add more keywords to do freq. attack?

A: Those keywords should be signed by trusted parties (e.g. McAfee)

Q: followup on behavioural IDS. not clear how your system works with them.
A: actually much of those are about connection analysis and .. which is unencrypted !

Q: Your approach seems very english centric. How does it work if you're in china?
A: I would actually just use a VPN instead of HTTPS ;) we can tokenize binary

Q: You said IDS operators don’t want to reveal their secrets so you need them in-network. Is that really true?
A: There are many reasons why you still need that in-network analysis even if you could run it on your computer (e.g. what if your machine is compromised, the in-network can still save you)