Friday, August 25, 2017

SIGCOMM'17 - Session 6 (DC Traffic) - DRILL: Micro Load Balancing for Low-latency Data Center Networks

Presented by Soudeh Ghorbani

Most works on load balancing in data centers need global information. DRILL proposed a solution in the reverse direction: load balancing based on local decisions at each switch. DRILL performs per-packet load balancing according to the queue occupancy. But sending packets to the shortest queue may result in suboptimal performance globally, so DRILL introduces randomness. Specifically, it leverages the “power of two choices”: each engine compares queue lengths of two random ports plus the previously least-loaded port, and sends the packet to the least loaded of these.

In DRILL, the forwarding decisions for each packet is independent of other packets in the flow, so there might be excessive packet reordering and TCP will suffer. But surprisingly, DRILL experiences very minimal reordering. This is because the queue length has little variance and packets experience similar queueing delay. Although the packets take very different paths, the delays of packets along those paths differs little. So, packet reordering is uncommon in DRILL.

Another challenge is asymmetric network, which may have an asymmetric topology by design or be caused by link failures. In this case, the load balancers that split individual flows among available paths may waste bandwidth. The solution is to decompose the network into symmetric components. DRILL decomposes the graph by assigning scores of paths by a hash map. Paths with the same score are symmetric. DRILL iteratively assigns the scores and picks the set of symmetric paths.

Q: Have you compared with prior works doing round robin load balancing on virtual symmetric topologies?
A: Yes, we did some comparisons using randomized algorithm over two samples. But we miss round robin is because they don’t work well on asymmetric topologies. Even for symmetric topologies, we have some benefits over round robin. Because of the two mechanisms in DRILL, we expect even more advantage on asymmetric topologies.

Q: We did micro load balancing and we found it does not optimize downlink from the core, because there’s no alternative paths. It’s no better than per-packet random by our experience.
A: If the topology is symmetric, you will expect the same load balancing from all switches. The queuing happens at the first and the last hop, so we have very little queuing from the spine to the leaves.

SIGCOMM'17 - Session 10 (Peering) - Detecting Peering Infrastructure Outages in the Wild

Presenter: Vasileios Giotsas
Co-Authors: Christoph Dietzel, Georgios Smaragdakis, Anja Feldmann, Arthur Berger, Emile Aben

The link to the paper.

In the recent years the Internet infrastructure has changed a lot unexpectedly, one of them is known as "flattening" of the Internet,   e.g., inter-domain traffic flows directly between edge networks  (IXPs, CFs), and bypassing transit providers (ISPs). As peering thousands or hundreds of cloud datacenters become commonplace, IXPs and CFs gain their popularity. On the other hand, little effort is paid to know about outages and the influence of the outages. Thus this paper addresses this problem and proposed a system to provide near real-time outage detection close to the level of buildings, and report the experience of running it for a few years.

In order to detect the outage (automatically, timely and at the building level), localize the outage (able to distinguish cascading failure effects from outage sources), and track outage (as well as determine the duration, shifts in the routing paths and geographic spread), the authors proposed the Kepler. With a initialization phase parsing the BGP routes, Kepler can detect and determine the outage by sensing outage signals and investigating the signals. To help pinpoint the outages precisely, it is required to compute high-resolution co-location maps from the communities to decorrelate the behaviors of affected ASes.

In the evaluation and results, authors showed that Kepler can do a very good job on de-noising BGP routing activity and providing strong outage signals. It can also detect peering infrastructure outages in the wild, and measure the impact of such outages. 
This result is significant because i) most of today's outage can't be detected automatically (unless operators raise questions in mailing list), and ii) monitoring or measurement of these outage is not achievable. These problems or questions can be answered here.

To conclude, Kepler use passive BGP monitoring to timely and accurately detect the infrastructure- level outages, and provide hard evidence on peering infrastructure outages thus can later improve accountability, transparency and resilience strategies.


Q1: On the map of world you show the community, it seems biased in European countries. Can you comment on it?
A1: Yes the majority of the community is in Europe or America. The truth is that it is biased when we collect the dataset. And the fact that there are more IXP in Europe

Q2: In the number of accuracy and precision, seem like the number of the outages.
A2: The information is from the mailing list, social media post and from operators, and tech reports

Q3: Any comments on the possibilities causing the outages?
A3: Mostly the causes are fiber cuts. And it actually makes the root cause diagnosis very efficient.

Q4: On your dataset, how can you decide it is maintenance or failure?
A4: In the case of maintenance, it happened in a accessive way. Every time it affect or large part, it  can be this case. Of cause sometimes it is not easy to distinguish

SIGCOMM'17 - Session 11 (Routing) - Paper 3: The Impact of Router Outages on the AS-level Internet


this paper uses large-scale measurement to understand the dependence of the AS-level Internet on individual routers. 149,560 routers are surveyed across the Internet for 2.5 years. 59,175 of the surveyed routers (40%) experience at least one reboot. This study identified specific routers that were single points of failure for the prefixes they advertised. A novel active probing technique is designed to identify router restarts. Networks are associated with routers using trace data, and then global BGP activities are correlated with router restarts. To infer single point of failures, outages are correlated with route withdraws by testing whether the windows overlap. The results show that only 4.0% of routers were correlated with complete withdrawals involving 3,396 prefixes, where the routers appeared in traceroute paths towards the prefixes.

Q: what you identified is the percentage of routers as single points of failure. You sort of hide 6 percent of them in customers' edge, presumably customers are harming themselves by having a single point of failure. For the other 40%, how much does Internet get taken out, i.e., ISP edge, device serving multiple customers? So the fact isn't necessarily that 4% of Internet can be taken out by this singe point of failure, but in fact some larger fraction is disabled. Do you have any sense of that?

A: that is not the case that there is 40% outside of the customer edge. I think it is another 20% or something hidden in customer, but you stay with small, maybe around 10%, outside of the customer network, and you can imagine that if you are providing a tunnel service, the providers will say one of the cases will be sorted out.

Session 11, Paper 1: SWIFT: Predictive Fast Reroute (Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, Laurent Vanbever)

Presented by: Thomas Holterbach

2.5 minutes: the worst case convergence time of a router after a link failure. After a failure the router needs to learn about the failure and then update it's forwarding entries. It can take minutes for the router to scan its routing table. There are existing solutions for when the link is between the router's AS and a neighboring AS. However, no current solution works when there are remote link failures. Furthermore, remote outages are numerous: they analyzed bursts of withdrawals from RouteViews and RIPE RIS and found more than 3000 bursts in November of 2016.

SWIFT can reduce the convergence time of BGP from 13s to 2s, fast reroute around failures in 40ms, and is currently deployable on today's routers. SWIFT trades accuracy, now using region granularity, for speed.

SWIFT monitors BGP streams, uses an algorithm to detect link failure probability, and after detecting a link failure, predicts future withdrawals. It uses two metrics: withdrawals share: fraction of withdrawals crossing a link and path sharing: the proportion of prefixes withdrawn with a path on a link. Their product product is the fit score: this value gives the likelihood of the failure of a specific link.

To be fast SWIFT can't wait for withdrawals, so the algorithm is run early in a withdrawal burst. Furthermore, outages can affect multiple links, so SWIFT returns a set of links all with a high fit score.

Their experimental results show that SWIFT never makes an extremely inaccurate prediction while is correctly infers the failure of a majority of links.

The challenges for fast reroute are that any set of prefixes can be affected, failures can affect backup paths, and they want to be fast. Thus, SWIFT uses a hierarchical forwarding table to make the convergence fast. Entries in the table hold a primary next-hop, the AS-path the prefix is using, and the backup next-hop. SWIFT uses this to reroute the affected traffic to unaffected backup paths.
Evaluation shows that SWIFT's fast reroute requires less than 100 forwarding updates or about 40ms to effectively reroute after a failure.

Furthermore, it is deployable over most existing routers has many currently have support for hierarchical forwarding tables, but for routers that don't have that support, we can still deploy SWIFT using an extra SDN switch attached to the router.

Thus, SWIFT speeds up the learning of remote link failures, efficiently reroutes around those failures, and is deployable on existing routers.

Q: In the graph with false and true positives, what is the ground truth?
A: The ground truth comes from after rerouting if later those paths are withdrawn. We also evaluated with a simulator to have the ground truth.

Q: What are the main factors behind the long convergence time?
A: Each router has a decision process to go through, things like scanning through it's routing table. This process could explain why it is slow.

Section 9 - Realities, Paper 4: Carousel: Scalable Traffic Shaping at End Hosts

Presenter: Ahmed Saeed

Authors: Ahmed Saeed(GIT), Nandita Dukkipati(Google, Inc.), Vytautas Valancius(Google, Inc.), Vinh The Lam(Google, Inc.), Carlo Contavalli (Google, Inc.),  Amin Vahdat(Google, Inc.)

Network bandwidth is a constrained resource that is expensive to overprovision especially across the WAN. Accurate shaping to a target rate is increasingly important to efficient network operation.
Ahmed present traffic shaping, and traditionally, network switches and routers have implemented it. The problem is that shaping in middleboxes is not an easy option inside a datacenter. It is expensive in buffering and latency, and middleboxes lack the necessary state to enforce the right rate. Shaping in the middle does not help when bottlenecks are at network edges. So new traffic shapers that can handle tens of thousands of flows and rates are needed.

Ahmed , et cl. propose Carousel which is an improvement on existing, kernel-based traffic shaping mechanism. The main idea is to replace many queues with a single low-overhead queue. Carousel scales to tens of thousands of flows and traffic classes, and supports complex bandwidth-allocation mechanisms for both WAN and data-center communications. He shows the Carousel architecture.
Ahmed present their evaluation setup:
1.Carousel deployed within a Software INC.
2.Evaluated on Youtube servers comparing Carousel and FQ/Pacing.
3.Each server handles up to 50k sessions concurrently.
Carousel can save up to 8.2% of overall CPU utilization(5.9 cores on a 72 core machine).
Carousel improves even Software NIC utilization by 12% by increasing size of batches of packets enqueue in the software NIC.

Carousel allows networks operators for the first time to shape tens of thousands of flows individually. Carousel advantages make a strong case for providing single-queue shaping and backpressure in kernel, userspace stacks, hypervisors, and hardware.

Paper is here.

Q&A section:

Q1: I care about the data structure you are using, seems to be a trade-off, where you either have the granularity of pacing, or the number of packets. Have you quantified for example how many packets you have, the granularity and processing, that is very important for the products?
A1: We did consider all the parameters and through our evaluation, we found that with the bucket size 8000, we can support other traffic for Youtube.

Q2: I am wondering if your algorithm supports flexible scheduling.
A2: That is a good problem we are currently working on. We are designing a new algorithm to support arbitrary scheduling of the software.

Q3: Why should we have software rate limiters as opposed to hardware rate limiters?
A3: I think one advantage of this work is to decouple the allocators of memory whether in software or hardware from the number of rate limiters you can support. So I think this work actually is a little bit different from trying to look at software and hardware. So I don't think software vs hardware is proper for this work.

Section 9 - Realities, Paper 3: VROOM: Accelerating the Mobile Web with Server-Aided Dependency Resolution

Presenter: Vaspol Ruamviboonsuk
Authors: Vaspol Ruamviboonsuk(UMichigan), Ravi Netravali(MIT),  Muhammed Uluyol(UMichigan), Harsha V. Madhyastha(UMichigan)

The page loads on mobile devices remain disappointingly slow, which frustrates users and hurts the revenue of website providers. Recent research have found that dependencies between the resources on any web page are a key reason for slow page loads. Even though there are some approaches that have been taken to address the impact of these dependencies on web performance, they have fundamental drawbacks. For instance, Proxy Based Solution: Client must trust HTTPS content pushed by proxy; Proxy needs access to user's cookies for all domains.

Vaspol et cl. presented VRoom, and three challenges to approach.
1. How can web servers discover dependencies?
Combine offline and online + Defer to third parties
2. How do web servers inform clients of discovered dependencies?
HTTP/2 Push + Dependency Hints
3. How should clients use input from servers?
Vaspol presented VROOM scheduler in action.

Vaspol et al. evaluate VROOM from two perspectives. 1) Accuracy of dependency discovery, 2) Improvement in client perceived performance. VROOM fully utilizes CPU/Network, decouples dependency discovery from parsing and execution, decreases median page load time by 5s for popular sites.

The paper is here.

Q&A section:

Q1: Have you considered the optimal cases for web searching, such as you don't need to fetch the content dynamically, but instead all the contents are available in the web server and static Html and images are returned from the server to client, how does your approach fit in this scenario?

A1: We didn't do that exact comparison and that would be interesting to do for sure. We did something similar where all the contents are retrieved at the beginning of the page load from the web server, which turns out there is no improvement.

Q2: If the webpage is a simple shell with a Javascript fetching all other contents dynamically, the server needs to analyze the javascript with a lot of information from the users such as cookies. If the server does not require much information from the client, then extra information may be delivered to the client which turns to be not useful. There is a clear trade-off, how do you think about this?
A2: Considering the case where all the contents are dynamically generated such as Facebook, Twitter, we cannot analyze the dependency that is one limitation of our current work.

Q3: This is kind of prefetching, and there is one issue with the client cache. Have you considered this case and the influence of client cache?
A3: You know it's quite hard to correctly model the caching, and we actually did per page cache evaluation, and you can find some measurement results in our paper.

Session 11, Paper 2: Bootstrapping evolvability for inter-domain routing with D-BGP (Raja R. Sambasivan, David Tran-Lam, Aditya Akella, Peter Steenkiste)

Presented by: Raja Sambasivan

Raja proposed the question: What evolvability features are needed in any inter-domain routing protocol?

BGP is todays inter-domain routing protocol; however it: cannot limit ingress traffic, it has a high convergence time, has no QoS guarantees, it uses only one best path, and ASes can be spoofed. We have many great new protocols; however, unfortunately, very few have actually been deployed. The problem is that BGP is rigid in that it requires neighbors to use the same protocol and thus it is very difficult to evolve. The rigidity of BGP results in isolated islands, subsets of ASes that can use a different routing protocol within each other, that have to use BGP outside of those islands so that they can cross the "gulf" of ASes not using the same protocol. This reduces the benefits of new protocols and reduce the incentives to operate with those new protocols. One current solution is to use tunnels, established out of band, however they interfere and increase other ASes' cost, creating push-back against changes.

The authors analyzed 14 recently proposed changes to BGP and found that only two properties are necessary to allow a protocol to evolve. The first is pass-through support, to allow ASes to disseminate information across gulfs, disseminate the routes in-band, and enable discovery of other islands. The second property is a multi-protocol data structure that informs islands about what they use and provides a common framework to express end-to-end paths.

They then designed Darwin's BGP, D-BGP, which includes these two features and find that it isn't very far from BGP and that BGP already has pass-through support. D-BGP uses "path vectors" which contain the AS # and the Island ID. Within islands it uses a set of ASes to prevent other ASes from discounting end-to-end, within island paths.

They evaluate D-BGPs potential for accelerating the benefits of new protocols. They used a 1000 node simulation and randomly choose which ASes are using the different protocol SCION. And they found that D-BGP always allows extra paths to be used over BGP.

Q: Can we use D-BGP to still allow for the deployment of new functionality in the data plane?
A: Yes, those systems that we analyzed allow for different data plane functionality, they just modify the control information.

Q: Can you speak to the correctness of the protocol you ended up with? You have many ASes using many different protocols, will it ever converge?
A: I can't comment on that specifically. BGP already provides tons of flexibility and with the modifications of D-BGP it's not clear whether it makes it worse or not.

Q: When looking at what we need did you come away with any lessons about how we can move forward given that we have BGP now?
A: I would propose the question of how would you incrementally deploy D-BGP? The key-fixed point is that the path vector field is correct when it is being deployed.

SIGCOMM'17 , Session 9 (Realities), Paper 2: Who is Fiddling with Prices?

Authors: Costas Iordanou (Universidad Carlos III de Madrid, Telefonica Research), Claudio Soriente
(Telefonica Research), Michael Sirivianos (Cyprus University of Technology), Nikolaos Laoutaris (Data Transparency Lab)


[Link to the paper]

Costas  Iordanou et al.  build and deploy the Price $heriff for preventing price discrimination in e-commerce.
What does $heriff do?
$heriff is a highly distributed system for detecting various types of online.
A first-of-its-kind transparency software that allows one to see the prices as seen by others.  

How does $heriff do it?
The seven main components of $hariff, and the flow of messages during a single price check request.

Why is $heriff interesting?
Had to solve some difficult technical challenges:
  • Build P2P proxy network
  • Prevent user profile pollution (browser and server side)
  • Protect user privacy
  • Perform universal price extraction
  • Automate currency detection
Technical challenges
Why hybrid network of proxies?
Infrastructure proxy clients
+ Diverse predefined geo-locations
+ Easy to setup and control
+ No real users involved
- No price variation based on personal data can be observed
Peer proxy browsers
+ Diverse real user profiles
+ Price variations based on personal data
-Unpredictable availability and geo-location
-Browser side profile pollution
-Server side profile pollution

  1. Price variation across countries
  • 76 domains out of 1994
  • Price variation up to 600%
  1. Price variation within the same country
  • 7 out 76 domains (3 repeatable)
  • Price variation up to 7%
  1. No price discrimination based on personal data detected yet

There was a discussion after the talk.

Q1: Many companies have concerns about distributed denial of service attack as we developing application. I'm curious to know what you do to mitigate to potential for damaging sites significantly as you attempt to deploy it (unless you are catching intensely)?
A1: Actually we don’t need to allow a lot of requests at the same time. We just need small amount to be able to capture the differences. The system is minimizing the number of requests that we send to a specific domain. 

Q2: Did you find a way to gain the system to get the cheapest price in the web?
A2: We didn’t studied the system for this purpose. We mostly focused on research aspect of this tool.

Q3: The largest retails in China like Alibaba,, … display the price in images and captures to prevent the competitor websites get the price. Do you have any solution for that?  
A3: There are some websites that they put prices in images instead of text and it is not hard to use image detection techniques to get the actual price. It is trivial.

Q4: Airlines are doing this for years and based on your browsing history when you stay longer to optimize your flight, at some point they raise the price to scare you to buy the ticket immediately. However, if you clear your cookies before buying the ticket, the price will return to the normal price. I’m wondering if your system is prepared for changing the browser history when you are running the experiments?    
A4: We didn’t think of it but the way that we protect the users now is by using double carriers.
We can pre-train some profiles towards the end and then allow the users to switch the profile to get the better price.

Q4: how about clearing cookies?
A4: You don't need to clear the cookies because you have the browsers' APIs, you can switch the cookies.