Friday, August 21, 2015


This concludes Layer 9's coverage of SIGCOMM 2015. Thank you all for tuning in and to the contributors for their timely and accurate scribing! We understand that archived videos of the sessions will be available in the future. For now, slides from the topic preview sessions have been posted:

P4: Programming Protocol-Independent Packet Processors

Pat Bossharty , Glen Gibby ,  Martin Izzardy and Dan Talaycoy (Barefoot Networks),  Dan Daly (Intel) ,  Nick McKeownz (Stanford University),  Jennifer Rexford , Cole Schlesinger and David Walker (Princeton University),  Amin Vahdat (Google),  George Varghesex (Microsoft Research)

Paper link

Presenter : Changhoon Kim (Barefoot Networks)

With the advent of Software-Defined Networking (SDN) and OpenFlow standard, programmers can program control plane.In OpenFlow-enable switch, it recognizes a predetermined set of header fields and processes packets using a small set of predefined actions. However, considering that protocols evolve more rapidly such as encapsulation protocols for network virtualization and OpenFlow from 1.0 to 1.5, current approach such as OpenFlow can not express how packets should be processed to best meet the needs of control applications.

Fortunately, now Protocol-independent switch ASIC is available at a few terabit per second. Thus, programmers can also program data plane as well as control plane. To use this flexible data plane, the presenter introduced P4 which is a high-level language for Programming Protocol-Independent Packet Processors.

P4 is designed with three goals:
1.Protocol independence: Network devices should not be tied to any specific network protocols.
2.Target independence: Programmers should be able to describe packet-processing functionality independently of the specifics of the underlying hardware.
3.Re-configurability in the field: Programmers should be able to change the way switches process packets once they are deployed.

With P4, the programmer can program how the forwarding plane processes packets and a compiler transforms an imperative program into a table dependency graph that can be mapped to many specific target switches, including optimized hardware implementations.

In the talk, the presenter highlighted that this will accelerate innovations

- Areas of innovation
* Reducing feature set
* Network monitoring, analysis, and diagnostics
* Tunnel-splicing gateways
* Load balancing
* Attack detection and mitigation
* Host-stack offloading

- This will research opportunities
* Novel network and protocol designs
* development tools
* network verification, test, and simulation

Q1 : At the begining of slide, with P4, it is easy to debug problem in switch. However, current hardware switch already had many solutions like testing and feedback-loop. What is different?
A1 : In the talk, bugs mean software bugs which are from incorrect data plane description such as ambiguity.  With P4, compilers can help this point.  If the programmers specify incorrect data plane description, it will provide error messages. So it will reduce number of errors.

Q2 : What is the difference between OpenData Plane and P4?
A2 : I am not familiar with it, later we will talk more.

Q3 : What is the difference between OpenFlow and P4?
A3 : OpenFlow is one of the P4 applications. P4 can be one of OF.

Q4 : The actions such as changing header fields can be applied at the middle of pipelining procedure in data plane. In this case, does only parsing packet in first step before entering pipelining cause problems?
A4 : I did not mention this in presentation, P4 can specify deparse logic to handle this case.

Q5 : After submitting a paper, did you add more new features?
A5 : There are more functionality. Since the number of paper was only 6 pages, we do not mention all of them.  In P4 specification, you can see entire features.

A Primer on IPv4 Scarcity

Philipp Richter (TU Berlin/ICSI) , Mark Allman (ICSI) , Randy Bush (Internet Initiative Japan) , Vern Paxson (UC Berkeley/ICSI)

The IPv4 scarcity became reality since today only about 2% IPv4 addresses are available. The authors tried to answers these questions.
What is IPv4 scarcity?
What factors affect it?
What challenges does it pose?
How are we dealing with it?
To answer these questions, the authors first outline the evolution in address space management as well as address space use patterns,  identifying key factors of the scarcity issues. Finally, they characterize the possible solution space to overcome these issues.

The authors started the evolution of IPv4 address management and checked the degree to which allocation reflects actual use. They categorize history of IPv4 address into three time phases. First is "Early Registration phase". In this time, Address blocks were allocated quite informally, which caused heavy internal fragmentation and waste of address space. Second phase is "Needs-Based Provision". The modern framework of Regional Internet Registries (RIRs) was established in this phase. The primary goal of this phase is the conservation of address space and to get address space, the requester justify their need for the address space. Last phase is "Depletion and Exhaustion Phase". In this phase, more strict allocation policies were applied due to the small last remaining blocks.

During these three phases, the address space are almost fully exhausted.  However, according to "Figure 4. Allocated and routed address blocks", Large amounts of address space are not routed. More specifically, address ranges assigned prior to the existence of the RIRs shows poor utilization, whereas the RIR-allocated ranges show higher utilization, which means that policies to allocate IPv4 are effective.

The scarcity of IPv4 address makes people to think IP address as a virtual resource. People thought IPs are "for free" for last 30 years, but now IPs are valuable resource and became goods exchanged on markets.So, IPv4 address space transfer arise via RIR transfer policies and transfers outside the RIRs and network operators have already started buying and selling address blocks. However, defining the boundaries of what exactly an address transfer is and what it is not is not straightforward. Thus, the authors mentioned IP block transfer is more careful and needs further research.

Finally, the authors consider three possible solution to overcome IPv4 scarcity. First is to use IPv6  address space more. Considering the number of possible addresses with IPv6, this is ultimate nature solution. However, there is an issue that IPv6 is not compatible with IPv4 and complex transition mechanisms between IPv6 and IPv4 are required. Second is to multiplex current IPv4 address space using address sharing techniques like Carrier-grade NAT (CGN). Final possible solution is more efficiently use of the current IPv4 address space. According to Figure 4, about a third of all Internet address blocks are not routed, and thus not in (at least public) use.  Making more efficient use of address space will require adapting address management policies, guidelines and technologies,  including the difficult (both technically and politically) problem of re-assigning already allocated address blocks.

Q1. In slide 23 page, Can you show a graph in slide 23 page as unit of IP address blocks?
A1. It is in backup slides

Q2. Is the same situation possible in IPv6?
A1. It is possible, but for IPv6, there is no informal allocation and waste of IP addresses like first phase of IPv4.

Q3. Given the price of transferring IP block in slide, do you have an evidence about price to buy ip address block?
A3. It is hard to find prices since only few data is publicsized.

Q4. There are many unrouted IP address. How did you guarantee whether they are used for private IP address or for protected IP address? How many these accounts for?
A4. True. Some companies or governments use them as this purpose.
I can not know exact number. However, I assume they are relatively small.

Thursday, August 20, 2015

ASwatch: An AS Reputation System to Expose Bulletproof Hosting ASes

Session 11: Security, Privacy, and Censorship - Paper 2
Authors: Maria Konte (Georgia Institute of Technology), Roberto Perdisci (University of Georgia), Nick Feamster (Princeton University)


Public Review:

Cyber-criminals protect their network resources by hosting their services inside malicious ASes, referred as bulletproof hosting ASes. Current defenses against these malicious ASes rely on traffic monitoring using AS reputation systems like BGP Routing. The problem with these approaches is that they need lots of vantage points, have high false positives, hard to use and too late to prevent the attack. The authors’ approach is to monitor routes and use machine learning algorithm to identify these malicious ASes.  It is the first attempt to deliver AS reputation based on routing monitoring, exclusively on public data.

The system consists of two phases: training phase and operational phase. It uses confirmed cases of malicious and legit ASes as ground truth for training and extracts features based on their domain knowledge.  Example features include: Rewriting Changes, BGP Routing Dynamics and IP Space Fragmentation, which are explained in following paragraphs. Then using these matrices, the operational phase generates AS reputation report. These domain knowledge include:

Rewriting Changes/Link Connectivity: maclicious ASes tend to change connectivity more aggressively than legit ASes. To measure this, they take snapshots of connectivity. For example, the measurement takes as follows:
 - Monitor the last x snapshots
 - collect all providers
 - measure fraction of snapshots with each provider
Then the link connectivity is represented by three features from the distribution.

BGP Routing Dynamics: Malicious ASes routing dynamics are driven by illicit operations, in contrast, legit ASes dynamics are driven by policy changes, traffic engineering decisions.

Fragmentation and churn of advertised prefixes: Malicious ASes rotate their advertised prefixed, e.g., to avoid evasion, blacklisting; and they advertise large number of non-contiguous prefixes.

Using features like above, they train the classifiers and evaluate with cross-validation. The accuracy is 93% true positive and 5% false positive. They also investigate which features are important by including/excluding each feature family separately to see the performance change. The result shows that the most important features are the connectivity features in terms of true positive rate. Fragmentation and churn of advertised prefixes are less important than connectivity, but helps to lower false positives.

Q: How does the algorithm adapt as people become aware of the work? There are some security problems like tricking the algorithm.
A: The malicious behaviors themselves are very hard to change.
follow-up: What if the criminals don’t do these misbehaviors to avoid the detection? (take offline)

Q: Have you look into the patterns of these hosting ASes? Who are more frequent providers, who are legit provider?
A: No, it more complex than single provider examination. Yes, there are some providers tend to have more malicious ASes; what is happening is more likely that your may have legiti provider hosting malicious ASes without knowing.

Q: Say one AS that is malicious, don’t want to get routed? BGP has no guarantee the path we analyze is the path it’s gonna take, so what can I do?
A: We didn’t try to answer that question. What we were trying to do is to see how we can use the connectivity and BGP updates can be used to detect malicious ASes.

Q: Do you have or intend to provide some one-paragrah recommendation to the policy makers(ISP community), so they can use these intuitions.
A: That's a good question, we haven’t examined that path. There are similar questions like how to configure reliable AS protections.

Alibi Routing

Session 11: Security, Privacy and Censorship - Paper 1
Authors: Dave Levin (University of Maryland), Youndo Lee (University of Maryland), Luke Valenta (University of Pennsylvania), Zhihao Li (University of Maryland), Victoria Lai (University of Maryland), Cristian Lumezanu (NEC Labs), Neil Spring (University of Maryland), Bobby Bhattacharjee (University of Maryland)
Public Review:

Users have no control over routing path and lack the insights into undeniable proof of where their packets travelled. This is problematic because the packets might go thorough censoring country, causing collateral damage. Existing approaches like HTTPS and Anonymity can help hide information, but they are still somehow subject to censorship. The authors propose to avoid these geographic regions in the first place and provide proofs of the avoidance, which is called “provable avoidance routing”.

The problem is considered to be intractable because you are trying to prove something did not happen without enumerating everything that could have? the paper uses a very simple logic: if event A happens reduces to event X doesn’t happen, then conjectured with A actually happens, we have the proof that X doesn’t happen. Here A makes an “alibi”.  To put the routing into this logic, X is traversing a forbidden area and A is traversing a relay node. The key idea is to choose a relay that is far away from a forbidden node that the difference between the two traversing is significant.

Using these two mutually exclusive events, the author then presents “alibi routing”, a peer-to-peer overlay routing system.  The algorithm works in following steps:
1. users choose forbidden regions
    These regions are user-specified and arbitrary sets of polygons(defined over lat/lon)
2. users compute target regions (where alibis might be)
    Exclude locations where alibis cannot exit
    Segment the world into grid
    Include a grid point if the shortest latency(min RTT) to reach a destination via the alibis node alone is largely smaller than via the alibis node+any node in the forbidden area.
3. Alibi Routing recursively searches for peers within the target regions

The authors evaluate their work by both simulation(20k nodes) and an implementation deployed on PlanetLab which has 425 nodes.
The result turns out that
- Most source-destination pairs can provably avoid.
- Failure typically  arises when target region is too small or non-existent
- Failure is likely when source or destination are very close to the forbidden region
little latency overhead
- and more results in the paper

The conclusion is that provable avoidance is possible safely and efficiently. Alibi routing finds potential alibis successfully and at low cost.

The data and code are publicly available at

Q: Given that distance variability , it must cause equivalently false positives? Is that good?
A: This is what that delta parameter will affect, the idea is that you can specify smaller argument, so in that way hopefully that target region is still less than the minimum requirement to succeed. This is the tradeoff of success versus performance.

Q: The motivation is about censorship, but there are lots of people inside the forbidden area. But there’s very small percentage of traffic from Europe across China. Is that fair?
A: We didn’t evaluate the overhead and we aren’t sure how that could be fair. But even as centric as the US data is, still the majority of the time it still succeeds.

Q: What if attackers increases the delay?
A: Don’t know for sure. But the attackers cannot trick us by decreasing the delay.

Adaptive Congestion Control for Unpredictable Cellular Networks

Yasir Zaki (New York University Abu Dhabi), Thomas Pötsch (University of Bremen), Jay Chen (New York University Abu Dhabi), Lakshminarayanan Subramanian (New York University), Carmelita Görg (University of Bremen)


Public review:


It is well-known that Cellular Networks have throughput performance problems. One of the reason is the mismatch between TCP assumptions and Cellular Network behavior.

Authors present Verus, which is an adaptive congestion control, to mitigate the problem between TCP features and unpredictable cellular links. Verus is an end-to-end congestion control protocol that uses delay measurements to react quickly to the capacity changes in cellular networks without explicitly attempting to predict the cellular channel dynamics. Verus learns a delay profile that captures the relationship between end-to-end packet delay and outstanding window size over short epochs.

Authors compares Verus with TCP Cubic and TCP Vegas. Verus better utilizes the bandwidth. Also, TCP Delay and Vegas have higher delay. Verus provides nearly 30% higher throughput but with higher delay against Sprout, which is the state of the art technique.


Q: For very short LTE latency, what would be the TCP gain?
A: We did some analysis with commercial LTE deployments. Thus, we do not think there would be much different. You would be able to get the delay feedback and adapt.

Q: You can blast UDP flow and mess up TCP fairness.
A: Verus was evaluated against other TCPs on a tail drop link.

Q: Why did you do your implementation of TCP over UDP?
A: Convenience, not in the kernel.

Q: What if you implement on kernel space?
A: We might get better improvements since we could get the signals faster.

Q: What if the flows are short?
A: If the flows are short, they do not live beyond the slow start and perform similarly to TCP. The target was to look at longer-lived connections.

Q: Did you look at the convergence time?
A: No, we did not.

Session 11 Paper 4: Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests

Authors: Sam Burnett (Georgia Tech), Nick Feamster (Princeton)

Presenter: Sam Burnett

Link To Paper
Link To Public Review

More recently there have been studies that focus on internet censorship. One  essential roadblock when looking into such global events is the lack of hight quality data (What ,When and How stuff is being censored). In this talk, the authors present "Encore", a web-based censorship measurement platform that leverages on cross-origin requests. Encore achieves global visibility without installing any vantage points.

Current censorship mechanisms operate on the basis of anecdotal evidence. They require to install vantage points, but the problem is scalability. When censorship is happening in different regions and cultures, there are barriers that are difficult to overcome required to install such infrastructure.

Interestingly, social scientists really need this data, so to help them, Encore, convinces webmasters to install a snippet on their pages, which then reports to the central collection server that collects measurements.

The diagram above shows how encore works. A recruited site, when loaded within the censoring country generates a cross-origin request which reports it to the collections server weather the page loading was successful or not.

As most browsers do not allow cross-origin data reads, encore is designed in such a way that it only needs to see if access to the site was possible, which can be done without this functionality. Some examples that can be loaded using the Encore framework include iframes,  images and stylesheets. Furthermore, the operation is also browser independent.

 Data collection with Encore was done with the help of 17 webmasters and  9 months of data was collected. To validate the results, the authors also develop a testbed to perform controlled experiments. Most of the actual measurements were performed on the popular websites that already have cross-origin requests, for instance, the "facebook like" being button loading on some third party webpage.

As far as the nontechnical aspects are concerned, because Encore loads potentially harmful URLs and informed consent is not possible, there needs to be work to protect users.

Q : What if the censoring countries decide to block the webmasters, won't that be an economic rundown for them if they volunteer?
A: Yes that is correct, this is based on good will that the webmasters will install it on their systems.

Q: How is Encore different from certain censored lists?
A: We essentially want to transform these anecdotes and convert them into detailed data.

Q: You might be violating any ISP based residential laws, have you thought about that?
A: That is a very good, and we have not looked into it.

Q: Do you think cross-origin requests are in the right space?
A: If there is a cross-domain request that is causing someone harm, this  definitely need to be fixed and we are not the only ones who are doing this, there are people out there already.

R2C2: A Network Stack for Rack-scale Computers

This paper was presented in the "Congestion Control and Transport Protocols" session at Sigcomm 2015, London. The authors for the paper are : Paolo Costa (Microsoft Research), Hitesh Balani (Microsoft Research), Kaveh Razavi (VU Amsterdam), Ian Kash (Microsoft Research).

Paolo Costa presented the paper. The links to the paper and public review are here : PaperPublic review

Paolo started by introducing Rack-scale computers and how existing tree-based network topology is insufficient for a rack with 1000 compute entities. For rack-scale computing a distributed switching fabric is required; however, it introduces multiple issues for routing and congestion control.

1. Per flow routing: How to select the best routing, which can be non-minimal favoring high throughput over latency ?
2. How to choose best routing for different work loads?
3. How to monitor ECN and RTT for multiple paths; given the number of options for paths now way more than the number in standard topologies.

Paolo then presents a solution which advocates source control; each node broadcasts its work loads and route to the other servers and each one then accordingly, adjusts their rate of traffic flow. Paolo spots the "I do not believe it!!" look in the audience faces, and proceeds to outline a few optimizations that help in computing source rate and routing per flow in the order of milliseconds. They are:

1. Decoupling route and rate completion: route completion happens at a coarser time scale.
2. Rate is calculated per flow as opposed to per sub-flow; could lead to possible under utilization of network.
3. Allow head room for handling bursty nature of traffic.

Paolo concluded the talk by discussing the overhead of the "broadcast messages", which for small flows could theoretically be as high as 26%, but for typical data center traffic is expected to be around 3%.

Session 11 Paper 3: Herd: A Scalable, Traffic Analysis Resistant Anonymity Network for VoIP Systems

Authors: Stevens Le Blond (MPI-SWS), David Choffnes (Northeastern University), William Caldwell (MPI-SWS), Peter Druschel (MPI-SWS), Nicholas Merritt (MPI-SWS)

Presenter: Stevens Le Blond

Link To Paper
Link To Public Review

Many entities perform surveillance on internet traffic including VOIP calls. Current anonymity systems have issues with complete privacy (Tor) or scalability (DC-Net) and do not ensure quality service. In this presentation, the authors present Herd: A scalable anonymity system for VOIP calling by using low latency proxies while resisting these adversaries.

Let's say if Hollande wants to give a call to Snowden by evading GCHQ :)
If they use Tor, it is vulnerable to a time series or an intersection attack in a situation where the GCHQ have access to the core network.  For verification, the authors simulate an intersection attack and show 98.3% are traceable with an intersection attack.

In Herd, the authors introduce Mixes, that act as a router (relay) for the caller and the callee to ensure anonymization purposes.  These mixes are present within datacenters in different jurisdictions.
To initiate a call,  each user picks a datacenter inside the jurisdiction that it trusts, as it is unlikely that all the clients will trust the same mix. To further deploy region (zone anonymity) they further divide Mixes into smaller regions called "trust zones" to decouple caller and callee connections.

The Threat Model:
Adversary can sniff all traffic, and can perform traffic based analysis.

The Adversary does not have access to the complete internet infrastructure
There are jurisdictions that are friendly or indifferent.
Clients always use chaff and trust the mix.
Simple onion routing is used between communication through mixes.

To solve scalability issues, they introduce the concept of superpeers that shifts load from trusted to untrusted infrastructure. Clients connect to the superpeers (untrusted parties), which introduce are then connected to the trust zones.

They authors evaluate anonymity and scalability with other systems like Drac and Tor and use mobile and social network data (twitter and facebook) to simulate call patterns. Tor is susceptible to traffic analysis whereas Drac and Herd perform well in that scenario. When evaluating scalability, they observe that Herd scales out better than Drac. (See paper for numbers)

Some ongoing work includes:
Formal security analysis.
Deployment on  Mac and Windows.
Add other functionality such as video calls.

Q: How do you multiplex multiple client calls in a single mix?
A: you need to have several channels that are selected by the client.
Follow up: how does this choice happen?
The allocation is done by the mix, and then the clients establish chaffing channels.

Q: What are the legal requirements to operate the mix as a service.. eg. Germany has legal issues for  operating mixes, are you going to give us some good news?
A: You need to talk to the govt. about that

Q: In order for me to anonymity, there need to be a certain number of people, so if there are not enough people who will join?
A: The zone can place restrictions that you need to contribute to the calling ecosystem after which you can make a call.

Congestion Control for Large-Scale RDMA Deployments

Yibo Zhu (Microsoft) (U.C. Santa Barbara), Haggai Eran (Mellanox), Daniel Firestone (Microsoft), Chuanxiong Guo (Microsoft), Marina Lipshteyn (Microsoft), Yehonatan Liron (Mellanox), Jitendra Padhye (Microsoft), Shachar Raindel (Mellanox), Mohamad Haj Yahia (Mellanox), Ming Zhang (Microsoft).


Public review:

The goal of a network in a datacenter environment is to provide ultra-low latency, high throughput, and low CPU overhead. Current TCP/IP stack is too heavy. Remote Direct Memory Acess (RDMA) is another network paradigm where data are transferred from network interface cards (NICs) using a read/send memory commands that bypasses OS. Priority-based Flow Control (PFC) is used to avoid buffer overflow at switches, but it might lead to poor performance.

Authors developed a congestion control algorithm for RDMA system called Datacenter QCN (DCQCN). It provides end-to-end congestion control. They control switch buffers to avoid PFC. ECN measures per-switch and per-priority queue length. It is a rate-based congestion control scheme. DCQCN keeps PFC and uses ECN and hardware rate-based control. The authors present a fluid model to justify their parameter selection.

DCQCN provides higher throughput and lower instantaneous queue length than NO DCQCN. During presentation, author presented some comparison results between DCQCN and DCTCP which was not originally in the paper but is important. Author claimed DCQCN is more fine control than DCTCP since it is rate-based instead of windows-based.

Q1: What in your design is specific?
A: It is deployed in a data center network.

Q2: Did you compare with QCN?
A: Only in L2 domain.
Need to modify NIC ASIC.

Q3: Google says ECN is poor. Use TIMELY. Any comments on that?
A: TIMELY does not guarantee that it will not trigger PFC.

Spatiotemporal Traffic Matrix Synthesis

Paul Tune and Matthew Roughan (University of Adelaide)

Paper, public review, and code.

Paul started describing traffic matrices, which capture the amount of traffic between all pairs of nodes in a network.  Spatio-temporal traffic matrices capture traffic as a function of time.  Traffic matrices are useful for several applications, including generating network topologies, evaluating routing protocols, and guiding anomaly detection.

Paul explained that we usually require many traffic matrices, e.g., to compute confidence intervals. The traditional approach is to get data and fit a model.  Unfortunately, data is hard to get (e.g., proprietary).  Even when data is available, it is often outdated or biased (e.g., specific to one network).  One way to get around the lack of data is to have a model to generate artificial traffic matrices.  In these cases we want (i) simple (few parameters), (ii) efficient (fast synthesis), (iii) consistent, (iv) and realistic models.

Given building realistic models is hard, this work proposes controllability as an alternative, to give control over inputs used to test algorithms.  The goal of this work is to generate ensembles of traffic matrices.  Even though we lack data, we do have some data (e.g., we expect (daily) periodicity in traffic).  This work builds traffic matrices that account for all constraints in the model, without satisfying any implicit assumption.

Paul showed how the proposed maximum entropy model can be used, and how it is general enough to capture previously proposed models for traffic matrices.  This (somewhat math-y slides) ended with a demo showing traffic generated from the synthetic traffic matrices, illustrating the impact of constraints.

Paul mentioned that matrix generation is fast.  He also discussed that convex constraints can be incorporated (and non-convex constraints too if you search for the global optimum), as well as hard constraints.  Paul mentioned that converting knowledge into constraints is not always obvious, then proceeded to summarize the work.


Walter Willinger: Traffic is subject to routing.  How do I incorporate it in this framework?

A: I believe it can be done.  You would have to know the correlation structure between the routing matrix and the traffic matrix.

Walter: Is it my job then?

A:  Yes.  One thing is that forming the constraints could be difficult in some cases.  This case could be a bit more involved.

Anja Feldmann: How does your work compares with previous work on traffic matrices?

A: Some previous work is about inferring traffic matrices.  We focus on synthesis.  There are some work on synthesizing traffic matrices, but they are not entirely similar.

Anja: Why do you need many traffic matrices?

A: Depends on the type of application you use them for.  If you are testing a router, you might want traffic matrices where you have one very high value and also uniform traffic matrices.  You want to have variance to cover different scenarios.

Q: Different applications may have different constraints.  I expect one would need to add constraints to traffic matrices to get meaningful results.  Is there a way to integrate latency requirements?

A: I have not looked at the application level yet; this work is at a lower level.  I believe it would be possible, but could be more involved.

Anja: Could the same framework be applied? Would it make a difference if you have routers or if you have servers and end-hosts?

A: It would not make a difference, but I am not sure whether there would be more specifics details.

InterTubes: A Study of the US Long-haul Fiber-optic Infrastructure

Ramakrishnan Durairajan (University of Wisconsin - Madison), Paul
Barford (University of Wisconsin - Madison and comScore, Inc.), Joel
Sommers (Colgate University), Walter Willinger (NIKSUN, Inc.)

Paper, public review, and datasets through the PREDICT program.

Ram discussed the Internet combines many technologies, but that we seldom study its physical infrastructure.  This work is about how the US long-haul fiber infrastructure looks like, how resilient it is, how it impacts risks, and how we can improve it.  Ram mentioned that no one has a complete view of the Internet at a physical level, and quoted Ted Stevens's "The Internet is just a series of tubes."

Ram explained the process used to build the map, which anyone can follow given effort and time: (i) first build an initial map from geocoded ISP topology maps (11 ISPs), (ii) validate with other sources, (iii) extend initial map with non-geocoded ISP topology maps (9 ISPs), (iv) infer shared conduits (using similar sources as those used in (ii)).  Maps are built using ArcGIS, a tool used by geographers.  This is the final result:

Ram showed a few examples of data mining, consistency checks, and discussed a few properties of the map (e.g., that it relates to road and railway meshes).  Ram then discussed risk induced by infrastructure sharing, he mentioned critical choke points shared by many (17+) ISPs exist, and that the physical connectivity lacks diversity observed at higher layers.  Ram also showed that the majority of ISPs trade lower costs for decreased resilience (i.e., share infrastructure).

Ram also discussed approaches to improve physical connectivity.  For example, ISPs can reduce infrastructure sharing without increasing path length significantly.  Finally, Ram mentioned the implications of this work on policy-making.


David Clarke: You mentioned 18 ISPs were sharing a conduit, but that does not tell how much harm a failure there causes.

A: It depends on the intradomain topology.  What we can do is associate a node with close-by population centers, and we can related the impact of a cut to the population.

David: Microsoft and Google have networks that look larger than Level 3's.  Have you tried looking at their networks?

A: We have Level 3, but not the others.

Q: Why are ISPs and companies hiding this information that we can figure out easily?  What do you think about this opacity?  Your results are particular to the US.  What would happen if you looked at a different country?  Any comments on what is specific to the US.

A: Ram showed fiber maps for Africa and Estonia, mentioned they shared similarities and could be done for other countries.  He guessed opacity might be related to public image.

Keith Weinstein: When you validated your dataset, what percentage of the links were you able to validate?

A: More than 95%.  There were maps from several locations that we could not validate.

Q: Connectivity for research networks (ESNET and I2)?  These networks might be more amenable to sharing information with you.  Are there strategies or differences for these networks?

A: We did not include this data at this time.

Nick Feamster: I guess you are aware of Sean Gorman's work.  Was there a change of direction in the trend to classify this information?  Did you see any difference between his analysis and this?

A: We are aware of Sean's work.  All of the data use use is publicly available.  We do not point to specific locations.

Nick: Do you have any information about FTTH and FTTN deployments?  It might be useful to join with this information.

A: Yes, there is information.

Anja Feldmann: You said that getting the information was easy?  Was it really?

A: (Laughs.) Lots of searching and lots of coffee.

Anja: What would be a good dataset for estimating traffic?

A: Previous research; we used number of traceroutes going through conduits.

TIMELY: RTT-based Congestion Control for the Datacenter

This paper was presented in the "Congestion Control and Transport Protocols" session at Sigcomm 2015, London. The authors for the paper are : Radhika Mittal (University of California, Berkeley), Vinh The Lam (Google, Inc.), Noontide Dukkipati (Google, Inc.), Emily Blem (Google, Inc.), Hassan Wassel (Google, Inc.), Monia Ghobadi (Microsoft), Amin Vahdat (Google, Inc.), Yaogong Wang (Google, Inc.), David Wetherall (Google, Inc.), David Zats (Google, Inc.).

Radhika Mittal presented the paper. The links to the paper and the public review are here : PaperPublic review.

The authors claim theirs is the first RTT based congestion control scheme for data center network (DCN).  Data centers typically require a high throughput and low latency network and is less-tolerant to packet losses. Traditional transport protocols which are loss-based are not suitable for the data center environment; the state-of-art data center transport protocols like DCTCP and other ECN based schemes, use support from the network switches (through markings)  to indicate onset of congestion.

RTT although a direct indicator of latency, is not useful because of noise in measurements; the noise becomes prominent at micro-second latency levels. The authors argue that noise in measurements can be avoided by computing RTT using timestamps from the NIC; they show experimentally that there exists a strong correlation between RTT from NIC and queue length in the switches. The reader is referred to paper for the details of how timestamps are obtained from the NIC.

Using the computed RTT, authors propose TIMELY, a RTT-gradient based AI-MD rate control algorithm which increase the sending rate by a constant if the gradient, d(RTT)/dt, is non-positive and decreases rate multiplicatively if d(RTT)/dt > 0. They adopt a rate based approach, as opposed to window based rate control because it suits well with widespread use of NIC support. To address the jittery nature of RTT-gradient measurements because of traffic bursts, safeguards are provided so that multiplicative decrease does not kick in even when the absolute RTT is very low; similar safeguard exists to avoid rate-increase for high RTT values.

The evaluation compares TIMELY with, a kernel stack implementation of DCTCP, and priority flow control (PFC). Through small-scale experiments, it is shown that for roughly the same throughput, TIMELY provides an order of magnitude lower RTT. Large scale experiments (100 of machines in CLOS cluster) also show TIMELY is consistently able to support low latency requirements.

Q & A session

1) How do you account for one way congestion?
ACK prioritization to ensure RTT is not affected by reverse path congestion.

2) [Jitu Padhye, MSR] Would tighter correlation between ECN marking and queue length be observed if hardware pacing were  used ?

3) How to deal with route flapping and multi path or ECMP ?
Left for future work.

Efficient Coflow Scheduling Without Prior Knowledge

Session: Scheduling and resource managment (2)
Authors: Mosharaf Chowdhury (UC Berkeley), Ion Stoica (UC Berkeley)
Public review:

Motivation: Based on a month-long trace of 320,000 FB jobs, it was found that on average about 25% of the runtime is spent in intermediate communications. As SSD-based jobs become more common, the network will become the bottleneck.

Flow–based solutions are either classified as per-flow fairness or flow completion time approaches. A coflow is a communication abstraction for data-parallel applications to express their performance goals:

  1. Minimize completion times
  2. Meet deadlines or
  3. Perform fair allocation.

Consider LAS (Least-Attained Service), which prioritizes the flow that has sent the least amount of data. Coflow-Aware LAS (CLAS) prioritizes the coflow that has sent the least total number of bytes. The challenges of CLAS, which are also shared by LAS, include:

  1. It can lead to starvation
  2. It is suboptimal for similar size coflows, since it reduces to fair sharing.

Discretized coflow-aware LAS (D-CLAS) uses the following:

  1. Priority discretization (Change priority when total number of bytes sent exceeds predefined thresholds).
  2. Scheduling policies (FIFO within the same queue and prioritization across queue).
  3. Weighted sharing across queues (guarantees starvation avoidance).

Aalo is a scheduler for DCLAS. It has a coordinator and a number of workers. Aalo is non-blocking, which means when a new coflow arrives at an output port, one puts its flow(s) in lowest priority queue and schedule them immediately. No need to syn all flows of a coflow as in Varys. Workers are sent information about active coflows periodically. The coordinator computes the total number of bytes sent and relays this information back to the workers.

Aalo is evaluated with a 3000 machine trace-driven simulation matched against a 100-machine EC2 deployment. Results show that Aalo is on par with clairvoyant approaches for EC2. Aalo generally outperforms Varys for job completion time. With regard to scalability, the results show that the faster Aalo jobs can coordinate, the better Aalo performs.

In summary: Aalo efficiently schedules coflows without complete information. It makes coflows practical in the presence of failures and DAGs. There is improved performance over flow-based approaches. It provides a simple, non-blocking API. The code is open-sourced at

Q: Is there any benefit to fixing the priorities using a different approach from what you used in the paper?
A: Short answer is that is future work. Longer answer: You can do something as smart as that.

Q: I am trying to understand how to take these results in the context of Kay’s results from NSDI'15. What happens if I try your approach in Spark, and how will the results line-up?
A: All of the results depend on the type of work.

Q: This work seems similar to PeakFabric. Could you comment on the difference between the two?
A: In this dimension it is about the coflows, which makes the problem more challenging.

Q: What about coflow dependencies?
A: That is future work. This is a great question.

Q: What was the coflow distribution? Where there some which were large and others which were small?
A: In general there are differences between the sizes of the coflows. I don’t have the details on how they were different.

Low Latency Geo-distributed Data Analytics

Session: Scheduling and resource managment (2)
Authors: Qifan Pu (UC Berkeley/MSR), Ganesh Ananthanarayanan (MSR), Peter Bodik (MSR), Srikanth Kandula (MSR), Aditya Akella (UW Madison), Victor Bahl (MSR), Ion Stoica (UC Berkeley)
Public review:

Organizations are distributed data centers across the globe. You have performance counters and user activities. Right now if an app wants to get access to these counters, all the data is moved to a single data center and then analyzed. This is centralized data analysis. This paper discusses a method to have a single logical analytics cluster across all sites. This prevents the wasteful shipment of data across the globe.
Consider an example analytics job. There will be some map and reduce tasks. These tasks will cause a lot of traffic across the WAN. As a result there might be bottlenecks on some of the links. Typically one splits the tasks evenly. We are then able to calculate how long it might take to transfer data. What if we build a system that is aware of the network? Then we can put more reduce tasks.
Remember from the previous presentation that queries do not arrive at the same time as the generation of data. We present Iridium, which can jointly optimize data and task placement. For a single dataset, Iridium uses iterative heuristics for joint task-data placement as follows:
  1. Identify bottlenecks by solving task placement.
  2. Place the tasks to reduce network bottlenecks during query execution.
Iridium is evaluated with Spark 1.1.0 and HDFS 2.4.1.
Figure 6 from the paper shows that Iridium can get 4—19× speedup for the centralized baseline and 3—4x speed-up for the in-place baseline.

TO sum up. We proposed Iridium which allows analysis of logs spread across data centers. Iridium allows for a single logical analytics cluster across all sites. It incorporates WAN bandwidths and reduces response time over baselines by 3×—19×.

Q: In your Intro, you say the normal behavior is to copy the data to a centralized location. Your results show that copying the data to a central location is worse than leaving the data in place. Are we stupid for doing that?
Comment from audience: People have generally thought that it was stupid to move data.

Q: How is the completion time related? Some reduce tasks cannot start before map jobs are finished.
A: In our environment reducers always run after the mappers finish.

SIGCOMM 2015 Best Paper Awards

Best Paper:

Central Control Over Distributed Routing
Stefano Vissicchio (UCLouvain), Olivier Tilmans (UCLouvain), Laurent Vanbever (ETH Zürich), Jennifer Rexford (Princeton University)

Best Student Paper:

Rollback Recovery for Middleboxes
Justine Sherry (UC Berkeley), Peter Gao (UC Berkeley), Soumya Basu (UC Berkeley), Aurojit Panda (UC Berkeley), Arvind Krishnamurthy (University of Washington), Christian Macciocco (Intel Research), Maziar Manesh (Intel Research), Joao Martins (NEC Labs), Sylvia Ratnasamy (UC Berkeley), Luigi Rizzo (University of Pisa), Scott Shenker (UC Berkeley and ICSI)

Community Feedback Session


SIGCOMM goes through a 3-year cycle of preference: North America, Europe, and "Wild Card"
    - 2016: Salvador, Brazil
    - 2017: North America
       - 2-page site proposals due September 15
       - Will consider applications from everywhere, but preference given to North America.

There's a general downward trend in membership, however it's still the 7th largest SIG, of 37.

SIG is consistently profitable. Fund position is very strong.

Things that happen with that money:
   1. Financially sponsor conferences (CoNEXT, IMC, HotNets, ANCS, Sensys, e-Energy, ICN, SOSR). They cover the bill if there's a loss and make money if there's a profit. All SIG-sponsored conferences will have lower registration fees for the next several years.

   2. Give awards
        - SIG wide: SIGCOMM award, Test of Time, Rising Star, Doctoral Dissertation
        - SIGCOMM conference: Best paper, Best student paper, best paper in experience track

        - Other Award Winners in the SIG community:
                - Kimberly Claffy and Vern Paxson won the IEEE Internet Award
                - Albert Greenberg won IEEE Koji Kobayashi Computers and Communications Award
                - Sylvia Ratnasamy won Grace Hopper award

   3. Give travel grants
          - 275k per year
               - 60k for SIGCOMM, 30k for CoNEXT, 15k for IMC and 10k for everyone else
          - 40k for Geodiversity awards to support attendance at SIGCOMM
          - 15k for PC member travel from under-represented regions

   4. CCR:
           4 issues in 2014
           Received 136 papers, technical papers (33)
           2 Best of CCR papers at SIGCOMM

          New Student mentoring column edited by Prof. Aditya Akell
          New Industrial board column will be edited by Dr. Renata Teixeira

          Will be entirely online by 2016 (Want feedback on this decision- not final yet)

   5. Communication:
           Monthly newsletter
           Minutes of monthly EC meetings
           Selected highlights in Annual reports

   6. Making life better for members: Travel grants, reduced registration, Pay for excess page charges
       for best paper, Shadow TPC for CoNEXT, summer school support, support for national
       networking summits, encouraging industry-academia interaction

   7. Summer Schools:
          - Cyber-Security Summer School: Summer School on Information Security
          - TMA Summer School: Can read more online. Sponsored by NSA. Free for PhD Students

Activities of Industrial Liaison Board

   1. Industrial demo session at SIGCOMM: 11 demos this year

   2. Industry days:
       Collocation of SOSR and ONS
       Workshop on Research and Applications of Inter Measurements between IMC and IETF
       Planning of wireless day (Organizers: Sachin Katti and Ranveer Chandra)

   3. Experience Track at SIGCOMM

New Initiatives from Community Feedback:
 - Childcare at SIGCOMM
 - Ethics workshop/panel
 - Mentoring program
 - Technical background sessions
 - Video-recording of all SIG-sponsored conferences
 - Support for national networking workshops
 - Reducing volunteer burnout- MeetGreen

Get involved in SIGCOMM
  - More than an annual conference!
  - Contribute to CCR
     - Editorials
     - Offer to review submissions
  - Share resources on, get involved in curriculum development
  - Propose workshops at conferences
  - Propose summer schools

Questions to think about for feedback:

How accessible is our software?

   - Many 'no reply from authors'. Some people say 'can't be released'. Little software released meaning papers are hard to reproduce.

Short presentations and posters?
  - 17 min presentation + 5 Q/A + 2 hr poster
  - Allows better feedback to authors
  - SOSP does this already

Reflections on this year:
  - Experience Track
  - Facebook dinner. Conflicted with student dinner.

Announcement from Bruce: Thank you to everyone on the awards committee!

Question and Answer Session:

(Disclaimer: Things were captured as faithfully as I can- Each number corresponds to a topic that was talked about. Feel free to send me corrections!)


Comment about all electronic CCR.
   1- The time has come.
   2- One of the issues that has held us back: branding. Making sure that when someone looks at a paper that it's from SIGCOMM. You have to know that something is heavily edited and so on.


Not many people were invited to the student dinner. Very small number of senior members were invited.

Facebook dinner started at 8 or 9- after the dinner itself.

Previous issues where the SIG steps on itself, so presumptious to ask other people not to step on us.

Experience track: Fantastic to invite that sort of material. Kind of concerned about separating that. There should be no separation between the experience track and the regular paper. (With regards to best paper awards and the like as well).

Response to the experience track comment from PC Chairs: Viewed criteria between Experience track and original track as completely different, so it was unfair to ask someone to compare between them because it's like apples and oranges.


The SIG has made the awards committee public for the awards, but the SIGCOMM award committee isn't public. Why this disparity? This year, the name will be confidential, but in future years, it might not be.


Write down evaluation criteria for what has to be in a SIGCOMM paper. Do you need an industry partner?

Answer from PC chairs: No, the criteria are online. There are links to guides to getting your paper into SIGCOMM.

How meaningful it is to have a best paper out of 4? (For experience papers track) Maybe SIGCOMM's quality has been diluted?

Response: You can have 1 or 0 papers as best paper.


SIGCOMM experience track won't make their software available to the
community. Tricky to make software available to the community. Troublesome to
see the amount of people that don't make their software publicly. Something
that the IMC has embraced in making datasets available to the community.


Lots of work to make software publicly releasable. Maybe SIG can hire an engineer that will help with this effort?


We are making progress on making software available- 5 or 6 papers this year have links in the paper body.


Can academics also write experience track papers or must you be from industry? I hope that academics and so on will submit as well.


Whoever claims that data should be made public, should also make the software available...

SIGPLAN has a special mark saying that the software available.


Didn't like the experience track papers. Here's one of the reasons: All but one of the papers were a measurement paper. Another paper was an experience paper, but not many takeaways. Likes the experience track, but wants it to be more about experience not about data.

Response: Experience is about data, AKAMAI paper was awesome.


Person 1: Ethics track is really good. Chart a direction using the ethics track, and it's nice that SIG resources were spent making the ethics workshop. Thanks to the PC chairs, the process was pretty good.

Person 2: Put a process into place to deal with the ethics violations? Hope that we don't lose the institutional memory.

Person 3: Hopefully, we'll see more proposals and so on about how to deal with ethical issues.


Person 1: Maybe increase the conference to 4 days? There's not enough time to socialize and so on without missing papers.

Person 2: Not good to move discussion to poster session since it's not as public. (Referring to proposal from slides above)

Person 3: Reducing the number of papers. Many more breaks and so on. Less than 40 papers?

Person 4: No, the diversity of the field is increasing.

Person 5: Number of SIGCOMM quality papers may be higher than what we accept. Maybe have a way to accept more than the conference?

Vote for the VLDB model: strong show of support

Person 6: Kill CoNEXT if we move to the VLDB conference model since there's not very much of a discussion.

Person 7: Maybe make the SIGCOMM a multi-track conference for part of the time?

Last point: There seems to be an appetite for changing the conventions.

Wednesday, August 19, 2015

Session 6: A Control-Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP

Authors: Xiaoqi Yin (CMU), Abhishek Jindal (CMU), Vyas Sekar (CMU), Bruno Sinopoli (CMU)

Presenter: Xiaoqi Yin

The authors presented their to formalize the task of bitrate adaption in Internet video clients, and present Robust MPC, an improvement to existing MPC (model predictive control) systems.
Designing a bitrate controller is difficult because of the complexities of network performance such as the unreliable nature of internet performance, and the complex interactions with TCP. Open questions include the type of algorithm to use, how to balance QoE factors, and how to make it robust to various operating conditions.

QOE is linear combination of factors. Bitrate chance, rebuffer time and startup delay. Used in the online controller. Fomalized through offline QoE maximization as a mixed linear linear programming problem. Liitations of previous approaches, rate-bsed and buffer based.

Traditional MPC operates using a predictive optimization and a horizon with a sliding window to smooth out control and is used widely in many distributed control problems. In each iteration, a Mixed Integer Linear Program (MILP) is solved to compute the predicted control sequences. Unfortunately this method is not robust or fast enough for bitrate adaption, especially within a client browser.

The authors propose to solve the speed problem with their algorithm, Fast MPC, which calculates offline a lookup table of the MILP using the entire state space of model parameters. This table enables MPC control within the latency constraints of an online video player.  They evaluated this by adding Robust MPC in dash.js (an existing web video player) along with a throughput predictor. Compared against the state of the art. Improves 60% form unmodified dash.js and 15% over existing state of the art. Also 60% and 10% improvements over original and SotA respectively.

Q1. How far can you take the control theory approach: what would it look like when multiple users are competing with the same channel?
A.  Future work.

Q2. Why did you not do real experiments instead of the trace-driven evaluation in the paper?
A.     Wanted to evaluate under different QoE parameter space. Agreed real experiments would greatly benefit the work.

Q4. Does the use of the lookup table makes for an unscalable approach when calculating for different QoEs?
A. Since the lookup table is populated with the entire state space, this is not an issue.

Q5. Closer integrate with a lower level congestion control – did  you try different congestion controls with their system?
A.     The authors did not.

Q6. What does the table encode? Does it account for changes in screen size, etc?

A. Table only encapsulates bitrate.