Layer 9: October 2014

Wednesday, October 29, 2014

Session 3, Paper 1: Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane

Authors: Michael Borokhovich (Ben Gurion University, Israel), Liron Schiff (Tel Aviv University, Israel), Stefan Schmid (TU Berlin and T-Labs, Germany)

Link to paper: Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane

SDN simplifies network management by providing programmatic interface to a logically centralized controller, allowing splitting network into a “dumb” data plane and a “smart” control plane. This, however, comes with a cost: fine-grain control of the data plane would introduce computational overhead and latency in the control plane. This paper investigates which functionalities to add in the OpenFlow data plane (“south”), making it smarter to reduce interactions with the control plane and network more robust.

The approach in this paper relies on a simple template called “SmartSouth”, an in-band graph DFS traversal implemented using the match-action paradigm, using fast failover technique. In general, monitoring and communication functions are added to the “south” to make it more robust, proactively react to link failures and reduce interaction with control plane. Specifically, functions which are provided in the south:

Topology snapshot: collects current view of network topology; fault-tolerant, no connectivity assumed; single connection to controller is required
Blackhole detection: detects connectivity loss, regardless of the causes (e.g., physical failure, configuration errors, unsupervised carrier network errors). Two implementations are proposed:

multiple DFS traversals, each with different time-to-live TTL, using binary search to find the point where packet is lost. Complexity: log n
smart in-band counter: counter is read and updated during packet processing, and counter value can be written to packet header field; proactively install one smart counter per switch port; two DFS traversals needed: first traversal will go back and forth once on new link, second traversal will detect the blackhole (link with counter of value 1).

Critical node detection: check if a node is critical for connectivity; non-critical node may be removed for maintenance and energy conservation; cheaper than snapshot; only one DFS traversal with root.
Anycast: supports specification of multiple unknown destinations; it is extendable to specify service chains; useful to find an alternative path to the controller when link fails. Complexity: one DFS traversal

What’s good about the approach in this paper:

no new hardware or protocol features are required
keep states formally verifiable
some techniques are possibly extendable to other functions (e.g., using smart counter to infer network load).

This work serves as the first step and more discussions on how to partition functionalities between data plane and control plane are encouraged.

HotNets 2014: Infrastructure Mobility: A What-if Analysis

Scribe notes by Athina Markopoulou.

Infrastructure Mobility: A What-if Analysis

Today’s wireless access networks cannot keep up with the demand. Today’s network infrastructure (APs, cell towers) is static while users may be mobile. The main idea proposed in this paper is to make the infrastructure itself mobile, in order to exploit diversity. There is a wide range of mobility options (tethering - on the order of feet, ceiling railing - on order of meters, cell tower drones - order of km) and timescales to adapt. The authors put a disclaimer that they do not know the killer app yet and they consider this as a bottom-up enabling research. The author argued that their idea can be made practical through robots (“robotic WiFi”) and that it can provide compelling gains (in terms of SNR variation, throughput gains and other metrics, as evidenced by experimental results for micro-mini-macro mobility) without actually moving the APs much. He compared the approach to overprovisioning and argued that mobility is complementary to density. He also envisioned that the monitoring and control of mobility should be coordinated and optimized by the cloud. He listed challenges including: how to move the AP, how to coordinate this decision with other optimizations (e.g. channel selection, coding etc).

Q&A (during panel discussion, some of them addressed to all papers)

Q : If you make the AP mobile, things may break down (physically)easier. How do you tradeoff between higher throughput and higher chance of failure.
A: These risks, as well as psychological discomfort, are increasing with the use of robots in our life.

Q: The question is not about psychology aside, it is about reliability.
A: We can optimize for different utility functions. So far, we optimized throughput, but we could include reliability in our objective function.

Q: What is your baseline for comparison? Could you get the same benefits by simply using MIMO?
A: We are currently using a single antenna. Mobility is complementary to MIMO.

Q: All 3 papers require help from participants. How reliable are these participants and how sensitive is the outcome to optimal choices?
A: The precise placement of AP is not critical, since there is a lot of diversity, and many positions of the AP are good enough.

Q: Rather than mobilizing the AP, we could move the antennas, etc.
A: Yes, but restricted mobility means less opportunity.

Q: You seem to need a lot of computation in real time. Where should this computation be done?
A: the computation bottleneck is the search space. It can be done at the local AP. In case of multiple APs, it can happen on the cloud.

HotNets 2014: An AS for Us

Scribe notes by Athina Markopoulou.

PEERING: An AS for Us (presented by Ethan-Katz Bassett)

This work developed a testbed that allows researchers to configure an ISP (PEERING) and experiment with it. PEERING has its own AS number and IP address space and it peers with real ISPs. In particular, PEERING routers peer with 6 universities and providers. Richer connectivity is provided via peers at AMS-IX (Amsterdam Internet Exchange) and Phoenix-IX. When researchers configure this AS are allowed to only advertise prefixes that PEERING owes, not other people’s prefixes (so as to not become transit). The speaker then explained the use of the testbed through the motivating example of ARROW.

PEERING provides a sweet spot between realism (running things over the Internet) and control (by configuring this ISP). It can be used to enable running experiments for inter-domain routing research. The speaker concluded with a call to the community to use the testbed and propose new features.

Q& A (during panel discussion, some of them addressed to all papers)

Q: How do we know that we measure the Internet and not your infrastructure?
A: Right now documentation, we also plan to keep logs.

Q: Scalability issues?

A: It mainly depends on the number of prefixes we own.

Q: Have you thought about using IPv6?
A: We plan to look into that.

Q: Are ISPs concerned about their policies being inferred/exposed?
A: To our experience, there is no pushback from ISPs. They know their own policy but they don’t have the global picture, so they actually want more visibility.

HotNets 2014: Crowdsourcing Access Network Spectrum Allocation Using Smartphones

Scribe notes by Athina Markopoulou.

Crowdsourcing Access Network Spectrum Allocation Using Smartphones

(presented by Jinghao Shi)

The main idea proposed in this paper was to use a smartphone “within proximity” of the primary device (laptop or tablet) to collect measurements (channel utilization and WiFi scan results) without disrupting the primary device. The key participants in the PocketSniffer system are: the phone, the laptop, the PocketSniffer AP, the PocketSniffer server. Challenges that need to be addressed include the following:

Physical proximity: use phone next to laptop to collect measurements on the laptop’s behalf.
Incentives for the phone: one way is to use the user’s own phone to collect measurements for the user’s laptop. If Bob’s phone is used to help Alice’s laptop, credits can be offered to Bob, to use later in exchange for QoS.
Measurement efficiency: pick the phone to use based on criteria, such as proximity and battery level.
Measurement validation: the phone may provide wrong measurements (lazy, selfish). Solution: trust the AP + cross validation.

The author also talked about the bigger picture (global information, cooperation, use of game theory, interaction of wifi-cellular) and their implementation on Nexus 5 and a public testbed.

Q&A (During Panel Discussion, some of them addressed to all papers):

Q: How do you decide which phone, within proximity of the laptop, to use?

A: We can have this information and pick the closest one.

Followup Q: Using the closest phone to the laptop may not be the best proxy for measurements, due to fading. Locationsvery close to each other (exact distance depending on frequency/wavelength) may have very different signal strengths. Have you actually done measurements to validate that?

A: Yes, we pick the closer phone. Our measurements, so far, put the phone on top of the laptop. For the purposes of picking which channel to use this is good enough.

Q: Where do you envision the computation to happen?

A: Where to implement the algorithm (to decide what phone to use for measurement and what WiFi channel to use) must take into account security and privacy, concerns. In enterprise environments, there is a centralized controller.

Theia: (Simple and Cheap) Networking for Ultra-Dense Data Centers

Paper Title: Theia: (Simple and Cheap) Networking for Ultra-Dense Data Centers

Authors: Meg Walraed-Sullivan, Jitu Padhye (Microsoft Research), Dave Maltz (Microsoft)

Presenter: Meg Walraed-Sullivan

Paper Link: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final152.pdf

Ultra-Dense Data Centers UDDCs are expensive to build, therefore, more CPUs are packed into a rack, which poses a number of challenges: 1) power and cooling 2) failure recovery , 3) tailoring applications 4) networking problem. This talk focuses in networking problem raised from packing a huge number of CPUs into a rack. Theia suggests we rethink the ToR architecture used in many data centers that doesn't scale to connect thousands of CPUs. We should get rid of star topology for fixed direct connect topology. The upside is that it is way cheaper, reduces the power requirement significantly and requires smaller physical space. However, this cause a lose of full bisection bandwidth and constraints the flexibility of the topology. Theia suggests replacing switches with patch panels for connectivity among sub-racks and to the rest of the data center. It is clear that over-subscription is unavoidable.
We require a direct topology that minimizes the through traffic and supports a wide range of graph size. The best fit is circular graph but it might not be the best options.

Questions:

Q: what is the topology of between the racks ?

A: Imaginary. We don't have a DC yet

Q: Different racks should be able to make different trade offs? What do you think about having heterogeneous racks.

A: It's hard to do it, but we might consider it.

Q: At this scale, don't you think you need clever replacement as oversubscribe is unavoidable?

A: We need clever placement, we do it today. We will keep it in mind.

PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks

Paper Title: PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks

Authors: Wei Bai, Li Chen, Kai Chen (Hong Kong University of Science and Technology), Dongsu Han (KAIST), Chen Tian (HUST), Weicheng Sun (Hong Kong University of Science and Technology and SJTU)

Presenter: Li Chen

Paper Link: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final91.pdf

Existing data center network flow scheduling schemes minimize flow completion time FCT, however, they assume a prior knowledge of flow size information to approximate ideal preemptive shortest job first and require customized switch hardware. PIAS minimizes FCT without requiring prior knowledge of flow size by leveraging multilevel feedback queue (MLFQ) that exists in many commodity switches. The goal is to have an information agnostic approach that minimizes FCT and being readily deplorable. Initially, a flow gets the highest priority and is demoted as it transfer more bytes. This is achieved by tagging packets and keeping per flow state and using switches queues.

There is a challenge on how to choose a demotion threshold. This is addressed by modeling it as a static FCT minimization problem. The second challenge exists if there is a mismatch between traffic distribution and is mitigated by using Explicit Congestion Control ECN.

At the end, the system is practical and effective, information antagonistic and achieve FCT minimization, and readily deploy-able.

Questions:

Q( Brighten Godfrey UIUC) How you handle the situation when you have different hardware that has different number of queues ? do you use the minimum ?

A : Yes, currently we use the minimum.

Q( Brighten Godfrey UIUC) Do you think putting storage at switches in DataCenter and use them in the way proposed in "Revisiting Resource Pooling: The Case for In-Network Resource Sharing" paper would yield a better flow completion time ?
A: Yes, it is possible

Revisiting Resource Pooling: The Case for In-Network Resource Sharing

Paper Title: Revisiting Resource Pooling: The Case for In-Network Resource Sharing

Authors: Ioannis Psaras, Lorenzo Saino, George Pavlou (University College London)

Presenter: Ioannis Psaras

Paper Link: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final109.pdf

Resource pooling principle is leveraged to manage shared resources in networks. The main goal is to maintain stability and guarantee fairness. TCP effectively deal with uncertainty by suppressing demand and moving traffic as fast as the path's slowest link. The approach taken in this paper is to push as much traffic in the network, once we hit a bottleneck then we store temporary in routers caches and detour accordingly. Note, in-network storage *caches* are not used for as temporary storage for the most popular content, instead, it is used to store incoming content in temporarily. The assumptions are: 1) contents have name, 2) clients send network-layer contents. In this approach, clients regulate traffic that is pushed in the network, instead of senders. Fairness and stability is achieved in three phases: 1) push data phase, 2) cache & detour phase, and 3) back-pressure phase. Evaluation shows that there is high availability of detours in real typologies.

Questions:
Q: In the table that shows the available detour paths in real typologies, 2 hops detour availability means that there is no 1 hop but there 2 hop?
A: Yes
Q( Brighten Godfrey UIUC) Do you think putting storage at switches in DataCenter and use them in the way you suggested would yield a better flow completion time ?
A: Yes, it makes sense. The approach could fit to datacenter.

Tuesday, October 28, 2014

EONA: Experience-Oriented Network Architecture

Presenter: Junchen Jiang

Authors: Junchen Jiang, Vyas Sekar (CMU), Xi Liu (Conviva Inc.), Ion Stoica (UC Berkeley/Databricks Inc./Conviva Inc.), Hui Zhang (CMU/Conviva Inc.)

Paper: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final121.pdf

The speaker argues that Quality of Experience (QoE) matters to content providers (app-providers) and this is reflected by the painstaking efforts taken by many app-providers to optimize the QoE. Similarly, infrastructure providers work to optimize the raw performance of their networks. Unfortunately, while both the app-provider and the infrastructure provider are working to prove performance, these efforts often results in sub-optimal QoE.

This sub-optimal is a result of two interesting properties.
First, Unfortunately, improving raw performance metrics does not necessarily improve QoE. Second, disjoint efforts by the app-provider the infrastucte-provider to improve QoE often results in sub-optimal results. More concretely, this sub-optimality results because neither the App-provider or the infrastucture-provider has both the right information and control knobs to independently improve QoE.

Fortunately, app-providers and content providers have started to form partnerships to exchange information. However, this partnerships and exchanging of information are both done in an-adhoc manner. EONA aims to provide a principled and structured way for the App-providers and the infrastructure-providers to exchange information.EONA provides two interfaces. The first, A2I for the app-provider to alert the infrastructure-provider of bad QoE along certain paths. This information allows the Infrastructure-provider can then use this information to provide better paths to the application. The second, I2A for the infrastructure-provider to alert the app-provider of routing changes, path information, and low-level statistics. This information provides the app-provider with visibility into the infrastructure and prevents contradictory actions. In tackling EONA, there are several open issues and research challenges that the authors plan to address. The first, developing an appropriate business model, and the second developing incentives that are aware of privacy and security.

Q&A

Andrew Furgerson: Is government mandate the only way to roll this out?
Junchen Jiang: No, there is no need for government mandate, industries can be incentivized to roll these out using monetary incentives.

David Orjan: Is the model pair-wise? Does it general to being mesh?
Junchen Jiang: Yes, it is pair-wise but you can reduce the communication over head by introducing a central content broker that aggregates and exchanges information.

Jeff Mogul: Have you talked to any ISPs about this?
Junchen Jiang: No. However, these collobarotions are happening but in an ad-hoc way which is why we need a framework to analyze it.

Beehive: Towards a Simple Abstraction for Scalable Software-Defined Networking

Presenter: Yashar Ganjali

Authors: Soheil Hassas Yeganeh, Yashar Ganjali (University of Toronto)

Paper: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final17.pdf

Although SDN argues for a centralized controller, centralization is not always an option for several reasons: networks demand scalability and performance. Unfortunately, existing solutions to scalability are point solutions that only further complicate the interface for writing SDN applications. More concretely, SDN programmers still need to write distributed programs and deal with issues such as concurrency.

The goal of beehive is to provide an API or abstraction that allows SDN developers to create centralized applications while the underlying platform scales by automatically distributing the SDN applications.

The key insight underlying Beehive is to demarcate application state and tightly control interactions between event-handlers and the application state. By providing a fine API between the event-handler and the application state, Beehive is able determine to map state usage to network events. In BeeHive, all application state is stored in dictionary (key-values stores). Each event-handler must specify the dictionary and the set of keys required to handle a message. Building on this, Beehive is able to distribute and parallelize applications by collocating state used by the same event to the same server. By ensuring that all state used to handle an event is collocated, BeeHive ensures consistency across all servers — no two servers will ever operate on the same state.

Beehive allows for a variety of use-cases: Migration, Fault-tolerance, Analytics, runtime analysis of resource utilization.

Q& A:

--------

Keith: How about storing all state in a distributed database?

Yashar: The distributed database no control over placement of networking state. some overhead communication to that subsystem.

Panel Discussion:

---------------

keith Winstein (Stanford): What is the next step to take the system to the real world?

Yashar Ganjali: The system is already built, now we need to add new application.

Nikola Gvozdiev: We have a preliminary prototype — we need to optimize to get better response to network events. Furthermore, it would be nice to explore a variety of network policies.

Junchen Jiang: stance to have a framework that proves a systematic way for this

Anees Shaikh (Google): the I2A interface may be problematic. have you thought about the range of information that can cross the boundaries.

Junchen Jiang: I2A may be too idealized too much info. info is already being shared. so you want a narrow interface. already happening in an adhoc way.

Minlan Yu (USC): many prior work work on one direction, a feature of yours is to have bi-directional. can you give some intuition as to why bi-directional is better that two unidirectional.

Junchen Jiang: it is valuable to include ISP in the loop. in the past ISP was in the passive mode — it is valuable to include them. There are things that can only be done when both directions are involved in the loop.

Minlan Yu (USC): will there be a feedback between the two?

Junchen Jiang: there may be some bad interaction but that is better than no interaction.

Ando Wang (UIUC): what are the objectives? are they general enough to characterize user experience. will many applications have conflicting user requirements?

Junchen Jiang: UE depends on Applications — need to be careful when designing objective functions. collaboration between the two should be aware of these distinctions.

Nicola Gvozdiev: Raw per-metrics aren’t sufficient. how the metrics will be expose is very app specific.

David Orjan (Cisco): How Beehive’s API work when you have to compose multiple applications? Do you assume they don’t interact in funny ways?

Yashar Ganjali: Application developers need to be careful to allow it to be scalable. Can have a series of smaller functions that are used to aggregate information between app.

Andrew Ferguson (Google): How rich are the set of messages being captured?

Yashar Ganjali: BeeHive only captures the dependencies between the current events and state. However it can’t capture dependencies between state and past events.

Ando Wang (UIUC): An ISP's infrastructure is optimized for a specific use-case and thus the set of knobs exposed are for that specific use-case. How will the ISPs implement the new knobs?

Junchen Jiang: We only expose existing knobs thus there is little need to change the infrastructure. A better question to ask is this: 'what new knobs can EONA expose? and when they are added how should they be exposed.'

Li: It is not profitable to improve QoE beyond a certain point as it will result in diminishing results.

Junchen Jiang: We have yet to reach the point of diminishing returns.

Anja Feldmann (TU Berlin): how much complexity does FUBAR and Beehive add? at what granularity should info be shared and when?

Fubar: timescales of a minute. collect per aggregate measurements and do calculation.

Yashar Ganjali: if compare Beehive to Pox/Nox, there is some complexity but the hope is that the new complexity will reduce complexity required by app developers.

Junchen Jiang: time granularity in EONA is a control channel between control planes.

Anja Feldmann (TU Berlin): What is the granularity for EONA?

Junchen Jiang: EONA doesn’t have to operate at the per-flow. The granularity you operate on depends on the problems you are troubleshoot.

Tolerating SDN Application Failures with LegoSDN

Authors: Balakrishnan Chandrasekaran, Theophilus Benson (Duke University)

Bugs in software is endemic. In SDN applications, bugs cause lots of issues (e.g. cascading crashes caused by an application), hindering adoption of SDN. The paper explores the problem of tolerating SDN application failures, leading to rethinking / reframing of SDN architecture. The position the paper takes is that one should not sacrifice availability, SDN networks should be resilient to application failures. Specifically, the paper presents LegoSDN tool, providing abstractions of isolating SDN-apps from the controller and isolating SDN-apps from the network.

Q (Bruce Davie from VMWare): More examples tolerated by LegoSDN?
A: Crash of application in LegoSDN (which is built on top of Floodlight) does not cascade.

Q: What about applications that share state?
A: LegoSDN handles that. LegoSDN applications still run in separate containers.

Q (Ying Zhang from Ericsson): SDN app store. tackling availability. What about malicious apps?
A: Focus of LegoSDN now is on "crash". But one may build a malicious-apps-solution on top of LegoSDN.

Q: Solution not restricted to SDN, will LegoSDN generalizes to other settings?
A: The authors are interested in looking for other domains.

Q ( from USC): Is the state shared on controller itself, or state shared in whole network. Will LegoSDN handles each case?
A: LegoSDN considers the latter.

Q: Rollback requires domain specific knowledge
A: LegoSDN abstraction provides a way to do rollback with domain knowledge.

Q: On rollback, issue of rollbacking one app that affects others?
A: Still working towards a solution.

Session 3, Paper 2: Characterizing Load Imbalance in Real-World Networked Caches

Authors: Qi Huang (Cornell University), Helga Gudmundsdottir, Ymir Vigfusson (Reykjavik University), Daniel Freedman (Technion), Ken Birman, Robbert van Renesse (Cornell University)

Link to paper Characterizing Load Imbalance in Real-World Networked Caches

Modern web services nowadays heavily rely on the in-memory caches to reduce the request latency for users and reduce the load on the back-end storage servers. To avoid overloading a single cache server, data is partitioned into multiple segments (“shard”) and stored across different cache servers. Related objects tend to be mapped into the same shard; and ideally, each cache server should see similar request rates. However, in the real-world workload, we often observe the load imbalance among the cache servers. Potential causes of load imbalance include: hashing schemes, skewed access popularity, etc. Using real workload of Facebook’s TAO cache system, this paper tries to answer the following questions:

What is the state of load imbalance?
What contributes to load imbalance?
How effective are current techniques?
How to improve?

The authors find that significant load imbalance observed in TAO. Possible causes:

Content popularity (ranking top 100 shards, and top 20 objects per server)
Hot objects: (Interestingly) very hot hot object alone are not a major cause of load imbalance of cache server.
Hot shards: Popular shards have much more traffics than the hot object

Next, the presenter evaluates the current techniques used to mitigate the load imbalance (consistent-hashing, hot-content replication) and find room for improvement. 3 approaches for consistent hashing: consistent hashing for in-memory networked caches implemented in libketama, TAO’s hashing, and baseline “perfect hashing”. 3 approaches for replication: None (no replication), TAO’s replication, optimal replication. The combination of these (hashing,replication) are evaluated using Facebook’s TAO workload. Metric used: max/avg, where max,avg are volume of requests of most loaded/average loaded servers). Some key results:

standard hashing (implemented in libketama): 53% higher load on the most loaded server compared to the average. Only achieve 41% higher load when implemented with optimal replication.
TAO’s hashing in general performs better than standard hashing, but still worse than when optimal hashing or replication is implemented in TAO
The streaming algorithm outperforms other replication approaches, but still has room for further improvement.

Future work:

Characterize how hashing and replication affect load imbalance?
Can streaming algorithm replicate content before its popularity surges?
Can we predict the popularity spikes and prevent hotspots.

A Highly Available Software Defined Fabric

Presenter: Aditya Akella

Authors: Aditya Akella (University of Wisconsin-Madison), Arvind Krishnamurthy (University of Washington)

This work is about providing high availability in SDNs. Aditya starts by arguing that SDNs today can not guarantee high availability. The reasons include 1) distributed consensus protocols that run in isolation from the data network, controller-controller and controller-switch mechanisms. He then discusses two strawman solutions for guaranteeing high availability with a running example. The Strawman1 consists of reliable flooding + controller replication. And the Strawman2 consists of partitioned consensus with reliable flooding. Both these strawman cannot guarantee high availability.

He then proposes a solution consisting of Strawman2 and a global distributed snapshot protocol. The distributed snapshot protocol tries to build tight consensus while ensuring consistency.

Aditya, discusses that this is just one solution, and they haven't explore the tradeoffs associated with it. There could be tradeoffs in terms of performance and complexity?

Q: The mechanisms(reliable flooding, global distributed snapshot + partitioned consensus) that you are proposing they seem to be sufficient but are they necessary?

A: This is one possible solution and we haven't explored the tradeoffs associated with this solution. There may be other proposals which could provide better performance and complexity tradeoffs

Q; You have considered that your network is implementing end-end routing, what about other management applications? Would you require mechanisms to ensure high availability?

A: Yes its possible that these applications may require different mechanisms to provide high availability.

FUBAR: Flow Utility Based Routing

Presenter: Nikola Gvozdiev

Authors: Nikola Gvozdiev, Brad Karp, Mark Handley (UCL)

Paper: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final158.pdf

FUBAR presents an application centric routing algorithm that aims to improve the utility of all participants while requiring no modifications to any of the systems: namely, the end hosts and switches. FUBAR is motivated by the insight that existing routing and traffic engineering techniques have drawbacks. For example, ECMP uses load balancing to better utilize the network but is prone to errors when the topology changes. Similarly, shortest-path based routing techniques over come these problems but poorly utilize the network. To overcome limitations, operators many perform traffic engineering (TE) by manually moving traffic and assigning utility.

These approach are manual, slow, and error-prone. FUBAR attempts to automate this by proposing an approach that automatically performs routing and traffic engineering: essentially combining routing and TE. To successfully achieve this, FUBAR requires a utility function that guides its efforts to mitigate congestion. The task of defining a utility function is complicated by the fact that users employ many applications resulting in a diversity of network requirements.

To address this, FUBAR defines a utility function that captures both delay and throughput. Thus allowing FUBAR to optimize for applications with a variety of requirements. FUBAR requires the operators to provide the delay utility function while FUBAR empirically learns the application’s capacity utility. These two are combined together by multiplying them.

To optimize the global utility of the network, FUBAR applies an iterative approach that first applies traffic to the lowest delay path and then eliminates congested paths by moving traffic from congested links.

FUBAR was evaluated on hurricane electric backbone and initial results are promising: FUBAR provides close to optimal utility. Furthermore, the algorithm runs quickly providing results within 40 seconds.

Question: how do you deal with deadline flows?

Answer:

Question: how do you ensure the LP solves within the timeline?

Answer: compromises need to be made between granularity of input and time to compute a solution.

NetEgg: Programming Network Policies by Examples

Authors: Yifei Yuan, Rajeev Alur, Boon Thau Loo (University of Pennsylvania)

Emerging SDN DSLs may not be the best intuitive interface for programmers. The paper thus explores the problem of completely automating SDN programming. The proposed solution is NetEgg, a tool that synthesizes the controller program from examples. That is, in stead of programming with some SDN DSLs, the programmer specifies the application scenarios, or, in other words, examples.

Given user supplied examples, NetEgg generates a solution compromising of three components: the state table, the state table, and the genericprogram that processes traffic. The key in the synthesis process is to enumerate a pre-sketched policy table, exhaustively checking it against examples for consistency.

Q (Laurent Vanbever from Princeton): The output policy seems to be highly dependent of the examples supplied. Could more complex policies be synthesized? Possible examples such as TE, routing?
A: Examples

Q (Anduo Wang from UIUC): How to characterize/quantify input examples? What if corner cases are missing?
A: It is debatable how complex the examples shall be. NetEgg provides no guarantee on correctness. NetEgg guarantees consistency with examples.

Q: Examples may conflict with each other, how does NetEgg deal with this?
A: NetEgg will detect inconsistency in examples.

Q: How to handle prioritized rules for IP prefixes.
A: NetEgg does not deal with priority.

Q: How compact are the generated policies?
A: NetEgg first generates a strawman policies, then compresses it.

Q (Junchen Jiang from CMU): What is the connection to the DSL research in programming language community.
A: NetEgg is similar to Excel spreadsheet in microsoft. In fact, NetEgg cannot synthesizes a Turing complete program, the synthesis approach in general is restricted to DSL programs.

Q (A.W from UIUC): Will NetEgg synthesize policy updates (changes)?
A: Yes, NetEgg deals with this. But if the update involves change to "state table", it becomes complicated since NetEgg needs to modify the underlying logical formula.

DNS Resolvers Considered Harmful

Authors: Kyle Schomp (CWRU), Mark Allman (ICSI), Michael Rabinovich (CWRU)
Presenter: Kyle Schomp

DNS resolvers abstract away the complexity of name resolution and improve the scalability of the DNS system, but are vulnerable to attacks such as cache injection and DNS amplification. Such attacks continue to persist, and are difficult to prevent. In addition, resolvers have been caught intentionally providing spurious responses (for censorship and for serving ads, for example). Yet, we continue to trust them. Further, resolvers also obscure client location, which can cause CDNs to pick the wrong CDN node for a client to connect to, thus increasing latency. These are well-known issues, and the community is developing solutions (for example, getting rid of open resolvers, DNSSEC, etc.), but these solutions aren't really working (there are still millions of open resolvers, and DNSSEC is 10 years old, but has low adoption).

Kyle proposes that perhaps we should try something different, i.e., getting rid of shared resolvers entirely! Today, the client sends name resolution requests to the DNS infrastructure, which sees a request, processes it in some opaque way, and sends a response back. This work suggests getting rid of the middlemen, and having clients communicate directly with authoritative DNS servers.

The advantages of this approach are:

a. reduced complexity of the name resolution system
b. removal of the target of many attacks, i.e., a reduction of the attack surface
c. better edge-server selection and load balancing for CDNs

However, there is also the potential loss of scalability of the DNS system, anonymity (in that the auth servers would then know what requests a client makes, instead of say, just the local resolver knowing), and the performance benefits of caching. The lack of anonymity may be somewhat fundamental to better, more transparent decision making, Kyle says. The work focuses on the performance and scalability implications.

They make 4 months of observations of DNS traffic from a 100 residences, and compare the increase in name resolution time in these traces to that in simulated client resolutions absent any resolvers. Results do show that name resolution without the shared resolvers does take a bit longer on the whole, but 45% of the time, it doesn't take longer, and 85% of the time the increase in latency is less than 50ms. Further, DNS responses are not used immediately in subsequent traffic, so there's some slack here. In fact, 50% of the responses are not being used at all, due to aggressive prefetching, with 36% being used within 50ms. Combining these observations, Kyle concludes that only 5% of connections will be delayed by 50ms or more.

For DNS scalability, the authors compare the number of queries that reach the auth DNS in traces versus in simulations without shared resolvers: 93% of auth domains don't see any increase in load. Most of these are unpopular ones, so they weren't benefit from caching anyway, but the rest are popular domains, and caching is useful. For 'com' for e.g., without resolvers, the load increases by 3.4 1 on average and 1.14 at the peak. The authors suggest increasing the TTL of records to mitigate this problem. In addition, the DNS protocol supports multiple simultaneous queries, so one can increase the number of questions per protocol message to reduce load. Together these approaches can cut the impact on scalability and performance to reasonable levels (the load increase being 1.33 on average with both ideas in place).

Broadly, this work questions whether the performance and scalability gains from using shared resolvers is worth the expanded attack surface.

Q: (Brad Karp) How did you decide that some of the queries were not used?
A: We paired requests to responses. Even if you get 4 A records and use just 1 of them, we count that as used.

Q: (Brad Karp) There's a DNS study from 2001(http://nms.csail.mit.edu/papers/index.php?detail=63); and some of these observations were known before.
A: I'll have to take a look.

Q: Do you have any data on what happens when auth is very far, but the resolvers are pretty close?
A: Not enough data yet.

Q: The numbers in your measurements can be very different if you're Asia or other places. In USC, for example, I'm 50ms away from auth, but only 5ms from my resolver.
A: The prefetching can mitigate this issue to some extent.

Q: It might be harder to remove resolvers than to fix them. Clients are by default set up to use a particular resolver, and many resolvers are very good. Perhaps a more useful strategy is to get rid of bad resolvers?
A: Definitely, not all resolvers are bad. All this doesn't have to be done across all clients for it to work; if you trust your resolver, don't get rid of it.

Q: (Anja Feldmann) In principle, this could be good. But how does it work with NATs and other "nasty little devices" in the network?
A: I have in fact seen problems using this on my laptop; definitely needs more work.

Q: (John Heidemann). Why would switching to recursive resolution fix some of these attack problems?
A: Kaminsky for e.g. requires timing knowledge which would disappear here.

Good Network Updates for Bad Packets

Authors: Arne Ludwig, Matthias Rost, Damien Foucard (TU Berlin), Stefan Schmid (TU Berlin and T-Labs, Germany)
Presenter: Arne Ludwig

Arne notes that networks are becoming more and more dynamic, partly an effect of SDN. And network updates happen for a variety of reasons, for example, due to changing policies. How do network updates impact security? For example, we don't want path changes to allow bad packets to bypass network firewalls. That leads into the motivating example of waypoint enforcement, where we want all packets of a flow to pass through a network box (say a firewall). It is easy to build scenarios (like the example in the talk) where the paths before and after the network update pass through the network box, and there are no loops in either configuration, but when changing from using one path to another both these problems can occur.

Arne notes that we can simplify the model under consideration to involve only nodes in the intersection of the old and new paths because other nodes can be updated without issue. Of primary concern in this work, are the properties of waypoint enforcement (WPE) and loop-freedom (LF).

At the core of the problem is that we cannot update network configurations "simultaneously"? This is not possible in practice; and the time between an update being issued and it taking effect varies a lot (as past work has shown). But can we plan a sequence of updates such that at each step we have WPE and LF? This is the question this work addresses.

The overall idea is to organize the update process in a series of rounds, such that within a round of updates, the order of updates will not break WPE and LF, and then figuring out how to order the rounds, still while preserving WPE and LF. The goal then, is to minimize the number of rounds (to cut the time during which the network is transient, and thus prevent negative effects on traffic.) The paper shows that a greedy approach which attempts to maximize the number of updated links within a round is both incorrect (sometimes fails to find a solution, even when one exists) and inefficient.

More broadly, this work shows that WPE and LF are in conflict, and in general, a solution while enforcing both at each step does not always exist. To guarantee the discovery of a solution when one exists, the authors frame the problem as a Mixed-Integer program with the objective of minimizing the number of rounds, the constraints building in the LF and WPE properties. The MIP either yields an optimal solution, or reveals that an instance is not solvable. The MIP's complexity is large though, leaving scenarios where the MIP leaves the instance unclassified. Arne discussed their analysis of how often the MIP find a solution when the greedy approach fails, and how often the MIP leaves an instance unresolved due to time complexity. The greedy approach finds the solution in ~80% of the instances offered. The MIP solves a few percent additional cases. As the number of switches increases, the number of unclassified cases increases (due to increasing complexity).

The overall conclusion of this work is that update sequences guaranteeing WPE and LF simultaneously are hard to find (and sometimes do not exist) and sometimes, "maybe we should just concentrate on the security part" (presumably, tolerating transient loops).

Q: What happens if we have more than one waypoint?
A: We're currently working on this; it's not an easy problem.

Q: As the number of switches increases, the MIP doesn't seem to work. Should we just use greedy?
A: The MIP does find a solution in many scenarios where greedy doesn't work.

Q: Did you look at the union of when MIP and greedy find solution?
A: All scenarios solvable with greedy were also solvable by MIP, except due to problem size.

Q: (Andrew Ferguson) Does the structure of the network have an impact on whether you can find a solution?
A: We don't consider the full topology; our model allows us to look only at the switches on the paths under consideration.

Q: Have you compared your algorithms with techniques based on dependency graphs?
A: Dependencies between updates? We haven't looked into that for the MIP, but finding a better / faster algorithm based on these dependencies is something we're looking into.

Q: Can you go through additional nodes to solve the unsolvable instances?
A: The current MIP wouldn't handle that, but if we had knowledge of the topology and the other paths available, we could look into this.

Anonymity on QuickSand: Using BGP to Compromise Tor

Authors: Laurent Vanbever, Oscar Li, Jennifer Rexford, Prateek Mittal (Princeton University)
Presenter: Laurent Vanbever

Laurent begins by reminding us of how We are Anonymous (punning on the hacker collective) ... but not really ... because of how routing works. Even if communication is encrypted, we can figure out who's talking to whom, infer location, and track behavior and interests. Tor is the best-known system to prevent the discovery of associations between the end-points of communications. Tor works by bouncing traffic over relays: an entry relay, middle, and exit relay. It forms a set of tunnels with the results being that no single entity can map the source to the destination: the entry relay knows the source, and the exit knows destination. But "because Tor is a low delay system", entry and exit traffic is highly correlated. By observing correlations at entry and exit points, its privacy can be broken. The question this work addresses is: can you increase an attacker's ability to make the necessary observations by (a) manipulating Tor itself (which is in the paper, but not part of the talk); and (b) manipulating the underlying routing?

The short answer to that question, this work shows, is yes. There are in fact, multiple BGP characteristics that increase the attack surface: (a) as BGP converges, more ASes see the traffic needed to make correlations; (b) an AS can also actively hijack routes to be able to see traffic; and (c) asymmetric routing in the Internet makes it worse because more ASes see the traffic (and it's enough to look at traffic in any direction).

To see how much impact these problems can have, the authors collected data on entry and exit Tor relays. It turns out, jut 3 ASes host 20% of Tor entry and exit relays, thus putting these ASes in a strong position as attackers. On path changes, how often do these happen in a way that impacts Tor? The authors use data from RIPE to answer this question, and find that Tor prefixes change more often than other BGP prefixes, and in 60% of these change more than 2 extra ASes receive traffic due to these changes, which is large, considering most routes don't involve many ASes (typically 4 or less).

So what suggestions do the authors have for mitigation? To protect from BGP's natural dynamism, one could prefer stable relays; to protect from route manipulation, preferring close relays helps; to protect against asymmetric analysis, one can encrypt the transport header. However, each of these solutions has its own difficulties, so ultimately, it's unclear how to solve the problem.

Q & A

Q: (Bruce Maggs) In Tor, don't they usually have more than one middle node?
A: Yes, it is just one node in the middle, at least logically. In any case, even if multiple middle nodes were being used (which I'm not sure about), not much would change in this analysis because we're primarily concerned about the entry and exit here.

Q: What about the correlations and timing? (Scribe note: I might have missed some part of the question.)
A: For attack mitigation, one can insert random delays, and people have worked on that. Effective approaches do impose additional latency, and can be used for mail etc., but not interactive application traffic.

Q: How transient are the BGP path changes? What time frame do you need for the attacks to work?
A: BGP convergence can often take minutes, so there's plenty of time to observe enough traffic for attacks.

Q A few ASes can capture lots of the Tor traffic (as your results show). Is that specific to Tor, or if you took any service, you could expect to find a similar skew just based on randomness?
A: This is probably specific to Tor, because lots of providers worry about hosting exit relays - shady things happen on Tor, and they don't want to be associated with them.

Session 3, Paper 4: Using Video-Based Measurements to Generate a Real-Time Network Traffic Map (Sun, Jiang, Sekar, Zhang, Lin, Wang)

(the paper can be found here: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final81.pdf)

Vyas Sekar is the presenter and he opens the presentation with an interesting analogy: In the real-world, real-time traffic information is shown on google map. How come we don't have real-time Internet traffic information shown somewhere?

Why is this problem hard? The authors provide three reasons:
1. It needs millions of vantage points
2. Bandwidth measurement incurs non-trivial cost
3. Real-time view requires continuous updates

However, the authors argue, the rise of the Internet video can be an opportunity. The reason being:
1. Internet video accounts for 30-50% of the traffic. Netflix alone has 50M users (i.e., 50M vantage points).
2. The cost is low, since now we can do passive throughput measurement during each video session.
3. Continuous updates are already collected by various service providers.

The authors envision an Internet Traffic Map Service, which takes the input of measurement results from various providers and network topology, using their proposed inference algorithm, and provides insights on link capacity and link utilization of the Internet.

Vyas briefly talked through how the inference algorithm works and showed preliminary results on accuracy.

Q&A:

Q: (Anja Feldmann from UT Berlin) Video traffic are rate limited, then how do you use rate-limited traffic to derive the link capacity?
A: That's a fair point, one thing you can do is to look at each burst. But that is definitely something we need to take into account into the inference algorithm.

Q: We can look at each chunk, but the throughput will be limited by last mile. As a result, you only have visibility at the bottleneck link. Also, how can a service provider, such as Akamai, monetize such a service?
A: How they monetize this is a separate problem, I am not going to answer that, but I want to emphasize that all these service providers already have these measurements today.

Q: A quick follow-up — Do you think Google and Netflix has a good coverage of the Internet?
A: Yes, we believe so.

Q: (Nina from Google) What’s the incentive for service providers to open up these information? how do you address privacy issue?
A: I am not these service providers, but again, these is what they have been doing and they have all these data.

Q: (Jeff Mogul from Google) Any ISP have these data, but the problem is they will never want to reveal it! (comments from the audience on network neutrality concerns.)
A: There is a paper in SIGCOMM reveals that we can infer network neutrality even without ISP providing data.

Q: (Dave Oran from Cisco) - How confidence are you that you are not measuring a middle box?
A: It’s very possible, especially if they are the bottleneck.

Q: (An audience from USC) Video traffic is served from CDN, how much of the measurement is going through the core network?
A: That is a fair point, we might have a biased data set towards edge networks.

Q: (An audience from Google) Regards to the analogy in the beginning. The reason why we can have traffic information is because we still drive our own car. But today, users don’t drive their own traffic. Do you think it's possible that this kind of information will ever be public?

A: I don't know the answer. (TY Huang from Netflix) Netflix already publishes ISP speed index, and we are fairly open as long as there is a need. Our motivation is really to share the knowledge with our users on how the Internet works. (Dave Oran from Cisco) Google also had a SIGCOMM demo sharing these information this year.

Session 3, Paper 3: Toward a Principled Framework to Design Dynamic Adaptive Streaming Algorithms over HTTP (Yin, Sekar, Sinopoli)

(The paper can be found here: http://conferences.sigcomm.org/hotnets/2014/papers/hotnets-XIII-final116.pdf )

Xiaoqi Yin is the presenter, he introduces himself more of a control theory person than a networking person.

The question this paper tries to address is: how to design a dynamic bitrate adaptation algorithm that provides better QoE, and how to do it in a systematic way.

In the current system, a bitrate controller resides in video client decides what video rate to request for given the network environment. There are many challenges for rate adaptation: it's hard to predict future bandwidth, interactions with TCP ... etc.

As a result -- there are 50+ papers in this field, but there is no way to compare one to another. Moving forward, the author argues that the community needs a systematic framework, instead of yet another point in the space. This work tries to provide such a framework, allowing us to systematically explore the design space and evaluating all these algorithms. Given the framework, they also further proposed an algorithm using control theory (MPC).

The proposed framework use a QoE model to evaluate the algorithms' QoE "score", and see (1) how sensitive are the algorithms to the operating environment?, and (2) How far the algorithms are from the optimal?

The authors classify existing algorithms into two classes: (1) rate-based (RB) algorithms and (2) buffer-based (BB) algorithms. They then compare the performance of RB, BB, and their proposed MPC algorithm.

In general, they found MPC > BB > RB. In other words, even though BB performs better than RB, MPC performs even better in term of QoE. However, both MPC and RB suffers if prediction error is large. Nevertheless, if the prediction error is bounded, then MPC can perform better than BB.

Other insights the talk provides including:
- All algorithm benefits from a finer-grained set of bitrates.
- MPC/BB can achieve near-zero buffering, compare to RB.

Q&A:

Q: (TY Huang from Netflix) - as an ABR algorithm designer, I really like the spirit of pursuing a framework for the algorithm design. However, the solution you proposed is heavily depends on the QoE model, which is still not well-understood yet. Any comment on how we can go forward without a well-understood QoE model with this framework?
A: The QoE model we proposed is a pretty general linear combination of some very well-known factors. The designers can play with the weights between factors to suit for different personal preference.

Q: (an audience from UIUC) Since video has a rate associated with them, should we use UDP now?
A: As I said in the beginning, I am not a networking person (laughter in the room), but the framework should be general enough to work with different transport algorithms.

Q: (Bruce Magg) Here you talk about VoD, how about live streaming? For live streaming, jitter matters too. How do we know what should be counted into the objective function?
A: It is indeed very important to incorporate domain knowledge into the objective function.

Q: (Dave Oran from Cisco) There is no commercial player that doesn't use buffer occupancy, it is always a hybrid model. Also prediction error is also correlated to the number of flows share the bottleneck link. How can you incorporate those information into your framework?
A: We can further improve the model if there is more information available.

Q: (Nina Taft from Google) What's the function you use as the objective function? They seems to be the key to your framework.
A: The QoE function we use is a linear combination between several key factors, such as rebuffer rates and video rates.

Q: (TY Huang from Netflix) I have a question regards to the usage of linear combination. How do you deal with confounding factors in this case?
A: We take care of confounding factors by taking both buffer occupancy and video rate into account.

Monday, October 27, 2014

Session 2: How to manage networks. Panel discussion. (Akella, Donovan, Wu)

Q: (Jeff Mogul) There exist a couple of existing works about natural language processing. You should look at this.
A: (Aditya Akella) There is indeed a lot of related work that we missed.

Q: (Nina Taft) What are your ideas to solve the scalability challenges in your respective problems?
A: (Aditya Akella)

Management-type activities are typically slow-paced. In most cases, there is no need for a fast answer.
The need for scale presupposes the availability of a lot of data in the first place. For that, we are actually in favor of an anonymized repository where people can post their management data.

A: (Sean Donovan) A lot of the optimizations that we are considering are specific to one area. More particularly, we are thinking of leveraging overlapping rules and pruning the rules that are unused on-the-fly. We could also aggregate rules that are contiguous, again, on-the-fly. Finally, we are thinking of leveraging new SDN features like switches with multiple tables.

Q: (Vyas Sekar) Trouble tickets are usually unstructured data. Can your work be used to inform the design of structured trouble ticket systems to enable better management plane analytics? Apparently, forensic professing in crime analysis have these kind of structure in their "tickets" system.

A: (Aditya Akella):

Tickets are structured and unstructured. For some of the outages, structured data are 3 pages long. Regarding what data to provide, if we see some common root cause events, then we can come up with a structure that capture those root causes systematically. In general though, some kind of unstructured data will always be there. And dealing with big data will still be necessary.
Another possibility is to relate the trouble ticket back to the control-plane actions that have lead to this trouble ticket being generated in the first place. These would constitute more structured data that could be of use.

Q: (Brad Karp) If you build a system that is shifting tasks around as yours, it seems that Aditya's results show you that you'll have more tickets. How can you provide hard guarantees to make sure that you are not hit by adverse events?

A: (Winfei Wu) Providing 100% guarantee is indeed difficult. We actually profile resources utilization e.g. every 100ms and, based on these 100ms profiles, they can then adapt the load to provide a given number of 9s.

Q: (Theo Benson, to Aditya) What kind of information should Sean provide you in order for your work to be better of?

A: (Aditya Akella) The kind of changes that are happening to the networks, what are the attributes of these changes, and what is the impact on the network.

Q (Brighten Godfrey) How do you integrate diverse sources of data?

A: (Aditya Akella) Normalizing data across different networks is indeed a challenge. One approach would be to normalize them according to some impact metric. For instance, the outage time. Within one network though, trouble tickets are fairly consistent according to Aditya. It is possible though that other data sets might be brought and bring value. I didn't look at this yet. And, for that to be useful, we would need to come up with a way to map these external data to "change events".

Q: What/Where is the boundary between the management- and the control-plane?

A: (Aditya Akella) The management plane is where humans interact with the network.

A: (Sean Donovan) The control-plane is where the system takes over.

Q: (Anja Feldmann) According to your talks, isn't it that the best network is a network that is perfectly homogenous and never changes? (laughs). In contrast, the research community is going towards more change, more flexibility. What tools or hooks do you think we need in order to make those network changes more easily manageable?

A: (Aditya Akella) We need ways to related the data that we generate to the corresponding root cause, then we would be in a better position to understand the impact of network changes.

Q: Sean has the challenge of going from general to specific. Aditya has the opposite problems. Can one help the other?

A: (Aditya Akella) We are using natural language processing on ticket to understand the root cause and the impact of a ticket. Our experience with that tool is that it required a lot of manual inputs to make useful predictions. There seems to be a rich avenue for tools that take those raw data and make more actionable inputs out of them.

A: (Sean Donovan) We haven't explored that connection yet, but that's definitely interesting.