Session 11: Security, Privacy, and Censorship - Paper 2
Authors: Maria Konte (Georgia Institute of Technology), Roberto Perdisci (University of Georgia), Nick Feamster (Princeton University)
Cyber-criminals protect their network resources by hosting their services inside malicious ASes, referred as bulletproof hosting ASes. Current defenses against these malicious ASes rely on traffic monitoring using AS reputation systems like BGP Routing. The problem with these approaches is that they need lots of vantage points, have high false positives, hard to use and too late to prevent the attack. The authors’ approach is to monitor routes and use machine learning algorithm to identify these malicious ASes. It is the first attempt to deliver AS reputation based on routing monitoring, exclusively on public data.
The system consists of two phases: training phase and operational phase. It uses confirmed cases of malicious and legit ASes as ground truth for training and extracts features based on their domain knowledge. Example features include: Rewriting Changes, BGP Routing Dynamics and IP Space Fragmentation, which are explained in following paragraphs. Then using these matrices, the operational phase generates AS reputation report. These domain knowledge include:
Rewriting Changes/Link Connectivity: maclicious ASes tend to change connectivity more aggressively than legit ASes. To measure this, they take snapshots of connectivity. For example, the measurement takes as follows:
- Monitor the last x snapshots
- collect all providers
- measure fraction of snapshots with each provider
Then the link connectivity is represented by three features from the distribution.
BGP Routing Dynamics: Malicious ASes routing dynamics are driven by illicit operations, in contrast, legit ASes dynamics are driven by policy changes, traffic engineering decisions.
Fragmentation and churn of advertised prefixes: Malicious ASes rotate their advertised prefixed, e.g., to avoid evasion, blacklisting; and they advertise large number of non-contiguous prefixes.
Using features like above, they train the classifiers and evaluate with cross-validation. The accuracy is 93% true positive and 5% false positive. They also investigate which features are important by including/excluding each feature family separately to see the performance change. The result shows that the most important features are the connectivity features in terms of true positive rate. Fragmentation and churn of advertised prefixes are less important than connectivity, but helps to lower false positives.
Q: How does the algorithm adapt as people become aware of the work? There are some security problems like tricking the algorithm.
A: The malicious behaviors themselves are very hard to change.
follow-up: What if the criminals don’t do these misbehaviors to avoid the detection? (take offline)
Q: Have you look into the patterns of these hosting ASes? Who are more frequent providers, who are legit provider?
A: No, it more complex than single provider examination. Yes, there are some providers tend to have more malicious ASes; what is happening is more likely that your may have legiti provider hosting malicious ASes without knowing.
Q: Say one AS that is malicious, don’t want to get routed? BGP has no guarantee the path we analyze is the path it’s gonna take, so what can I do?
A: We didn’t try to answer that question. What we were trying to do is to see how we can use the connectivity and BGP updates can be used to detect malicious ASes.
Q: Do you have or intend to provide some one-paragrah recommendation to the policy makers(ISP community), so they can use these intuitions.
A: That's a good question, we haven’t examined that path. There are similar questions like how to configure reliable AS protections.