Wednesday, October 29, 2014

Session 3, Paper 1: Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane

Authors: Michael Borokhovich (Ben Gurion University, Israel), Liron Schiff (Tel Aviv University, Israel), Stefan Schmid (TU Berlin and T-Labs, Germany)

Link to paper: Reclaiming the Brain: Useful OpenFlow Functions in the Data Plane

SDN simplifies network management by providing programmatic interface to a logically centralized controller, allowing splitting network into a “dumb” data plane and a “smart” control plane. This, however, comes with a cost: fine-grain control of the data plane would introduce computational overhead and latency in the control plane. This paper investigates which functionalities to add in the OpenFlow data plane (“south”), making it smarter to reduce interactions with the control plane and network more robust.
The approach in this paper relies on a simple template called “SmartSouth”, an in-band graph DFS traversal implemented using the match-action paradigm, using fast failover technique. In general, monitoring and communication functions are added to the “south” to make it more robust, proactively react to link failures and reduce interaction with control plane. Specifically, functions which are provided in the south: 
  • Topology snapshot: collects current view of network topology; fault-tolerant, no connectivity assumed; single connection to controller is required 
  • Blackhole detection: detects connectivity loss, regardless of the causes (e.g., physical failure, configuration errors, unsupervised carrier network errors). Two implementations are proposed:
    • multiple DFS traversals, each with different time-to-live TTL, using binary search to find the point where packet is lost. Complexity: log n
    • smart in-band counter: counter is read and updated during packet processing, and counter value can be written to packet header field; proactively install one smart counter per switch port; two DFS traversals needed: first traversal will go back and forth once on new link, second traversal will detect the blackhole (link with counter of value 1).
  • Critical node detection: check if a node is critical for connectivity; non-critical node may be removed for maintenance and energy conservation; cheaper than snapshot; only one DFS traversal with root. 
  • Anycast: supports specification of multiple unknown destinations; it is extendable to specify service chains; useful to find an alternative path to the controller when link fails. Complexity: one DFS traversal 
What’s good about the approach in this paper:
  • no new hardware or protocol features are required 
  • keep states formally verifiable 
  • some techniques are possibly extendable to other functions (e.g., using smart counter to infer network load). 
This work serves as the first step and more discussions on how to partition functionalities between data plane and control plane are encouraged.

Tuesday, October 28, 2014

HotNets 2014: Infrastructure Mobility: A What-if Analysis

Scribe notes by Athina Markopoulou.

Infrastructure Mobility: A What-if Analysis

Today’s wireless access networks cannot keep up with the demand. Today’s network infrastructure (APs, cell towers) is static while users may be mobile. The main idea proposed in this paper is to make the infrastructure itself mobile, in order to exploit diversity. There is a wide range of mobility options (tethering  - on the order of feet, ceiling railing - on order of meters, cell tower drones - order of km) and timescales to adapt. The authors put a disclaimer that they do not know the killer app yet and they consider this as a bottom-up enabling research. The author argued that their idea can be made practical through robots (“robotic WiFi”) and that it can provide compelling gains (in terms of SNR variation, throughput gains and other metrics, as evidenced by experimental results for micro-mini-macro mobility) without actually moving the APs much. He compared the approach to overprovisioning and argued that mobility is complementary to density. He also envisioned that the monitoring and control of mobility should be coordinated and optimized by the cloud.  He listed challenges including:  how to move the AP, how to coordinate this decision with other optimizations (e.g. channel selection, coding etc).

Q&A (during panel discussion, some of them addressed to all papers)

Q : If you make the AP mobile, things may break down (physically)easier. How do you tradeoff between higher throughput and higher chance of failure.
A: These risks, as well as psychological discomfort, are increasing with the use of robots  in our life.

Q: The question is not about psychology aside, it is about reliability.
A: We can optimize for different utility functions. So far, we optimized throughput, but we could include reliability in our objective function.

Q: What is your baseline for comparison? Could you get the same benefits by simply using MIMO?
A: We are currently using a single antenna. Mobility is complementary to MIMO.

Q: All 3 papers require help from participants. How reliable are these participants and how sensitive is the outcome to optimal choices?
A: The precise placement of AP is not critical, since there is a lot of diversity, and many positions of the AP are good enough.

Q: Rather than mobilizing the AP, we could move the antennas, etc.
A: Yes, but restricted mobility means less opportunity.

Q: You seem to need a lot of computation in real time. Where should this computation be done?
A: the computation bottleneck is the search space. It can be done at the local AP. In case of multiple APs, it can happen on the cloud.

HotNets 2014: An AS for Us

Scribe notes by Athina Markopoulou.

PEERING: An AS for Us (presented by Ethan-Katz Bassett)
This work developed a testbed that allows researchers to configure an ISP (PEERING) and experiment with it. PEERING has its own AS number and IP address space and it peers with real ISPs. In particular, PEERING routers peer with 6 universities and providers. Richer connectivity is provided via peers at AMS-IX (Amsterdam Internet Exchange) and Phoenix-IX. When researchers configure this AS are allowed to only advertise prefixes that PEERING owes, not other people’s prefixes (so as to not become transit). The speaker then explained the use of the testbed through the motivating example of ARROW.

PEERING provides a sweet spot between realism (running things over the Internet) and control (by configuring this ISP). It can be used to enable running experiments for inter-domain routing research. The speaker concluded with a call to the community to use the testbed and propose new features.

Q& A (during panel discussion, some of them addressed to all papers)

Q: How do we know that we measure the Internet and not your infrastructure?
A: Right now documentation, we also plan to keep logs.

Q: Scalability issues?
A: It mainly depends on the number of prefixes we own.

Q: Have you thought about using IPv6?
A: We plan to look into that.

Q: Are ISPs concerned about their policies being inferred/exposed?
A: To our experience, there is no pushback from ISPs. They know their own policy but they don’t have the global picture, so they actually want more visibility.

HotNets 2014: Crowdsourcing Access Network Spectrum Allocation Using Smartphones

Scribe notes by Athina Markopoulou.

Crowdsourcing Access Network Spectrum Allocation Using Smartphones

(presented by Jinghao Shi)

The main idea proposed in this paper was to use a smartphone “within proximity” of the primary device (laptop or tablet) to collect measurements (channel utilization and WiFi scan results) without disrupting the primary device. The key participants in the PocketSniffer system are: the phone, the laptop, the PocketSniffer AP, the PocketSniffer server. Challenges that need to be addressed include the following:
  • Physical proximity: use phone next to laptop to collect measurements on the laptop’s behalf.
  • Incentives for the phone: one way is to use the user’s  own phone to collect measurements for the user’s laptop. If Bob’s phone is used to help Alice’s laptop, credits can be offered to Bob, to use later in exchange for QoS.
  • Measurement efficiency: pick the phone to use based on criteria, such as proximity and battery level.
  • Measurement validation: the phone may provide wrong measurements (lazy, selfish).  Solution: trust the AP + cross validation.
The author also talked about the bigger picture (global information, cooperation, use of game theory, interaction of wifi-cellular) and their implementation on Nexus 5 and a public testbed.

Q&A  (During Panel Discussion, some of them addressed to all papers):
Q: How do you decide which phone, within proximity of the laptop, to use?
A: We can have this information and pick the closest one.
Followup Q: Using the closest phone to the laptop may not be the best proxy for measurements, due to fading. Locationsvery close to each other (exact distance depending on frequency/wavelength) may have very different signal strengths. Have you actually done measurements to validate that?
A: Yes, we pick the closer phone. Our measurements, so far, put the phone on top of the laptop. For the purposes of picking which channel to use this is good enough.

Q: Where do you envision the computation to happen?
A: Where to implement the algorithm (to decide what phone to use for measurement and what WiFi channel to use) must take into account security and privacy, concerns.  In enterprise environments, there is a centralized controller.

Theia: (Simple and Cheap) Networking for Ultra-Dense Data Centers

Paper Title: Theia: (Simple and Cheap) Networking for Ultra-Dense Data Centers

Authors: Meg Walraed-Sullivan, Jitu Padhye (Microsoft Research), Dave Maltz (Microsoft)

Presenter: Meg Walraed-Sullivan

Paper Link:

     Ultra-Dense Data Centers UDDCs are expensive to build, therefore, more CPUs are packed into a rack, which poses a number of challenges: 1) power and cooling 2) failure recovery , 3) tailoring applications 4) networking problem. This talk focuses in networking problem raised from packing a huge number of CPUs into a rack. Theia suggests we rethink the ToR architecture used in many data centers that doesn't scale to connect thousands of CPUs. We should get rid of star topology for fixed direct connect topology. The upside is that it is way cheaper, reduces the power requirement significantly and requires smaller physical space. However, this cause a lose of full bisection bandwidth and constraints the flexibility of the topology. Theia suggests replacing switches with patch panels for connectivity among sub-racks and to the rest of the data center. It is clear that over-subscription is unavoidable.
    We require a direct topology that minimizes the through traffic and supports a wide range of graph size. The best fit is circular graph but it might not be the best options. 


Q: what is the topology of  between the racks ?
A: Imaginary. We don't have a DC yet

Q: Different racks should be able to make different trade offs? What do you think about having heterogeneous racks.
A: It's hard to do it, but we might consider it.

Q: At this scale, don't you think you need clever replacement as oversubscribe is unavoidable?
A: We need clever placement, we do it today. We will keep it in mind.

PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks

Paper Title: PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks

Authors: Wei Bai, Li Chen, Kai Chen (Hong Kong University of Science and Technology), Dongsu Han (KAIST), Chen Tian (HUST), Weicheng Sun (Hong Kong University of Science and Technology and SJTU)

Presenter: Li Chen

Paper Link:

      Existing data center network flow scheduling schemes minimize flow completion time FCT, however, they assume a prior knowledge of flow size information to approximate ideal preemptive shortest job first and require customized switch hardware. PIAS minimizes FCT without requiring prior knowledge of flow size by leveraging multilevel feedback queue (MLFQ) that exists in many commodity switches. The goal is to have an information agnostic approach that minimizes FCT and being readily deplorable. Initially, a flow gets the highest priority and is demoted as it transfer more bytes. This is achieved by tagging packets and keeping per flow state and using switches queues.
      There is a challenge on how to choose a demotion threshold. This is addressed by modeling it as a static FCT minimization problem. The second challenge exists if there is a mismatch between traffic distribution and is mitigated by using Explicit Congestion Control ECN. 
     At the end, the system is practical and effective, information antagonistic and achieve FCT minimization, and readily deploy-able.

 Q(  Brighten Godfrey UIUC)  How you handle the situation when you have different hardware that has different number of queues ? do you use the minimum ?
A : Yes, currently we use the minimum.

Q(  Brighten Godfrey UIUC) Do you think putting storage at switches in DataCenter and use them in the way proposed in "Revisiting Resource Pooling: The Case for In-Network Resource Sharing" paper would yield a better flow completion time ?
A: Yes, it is possible

Revisiting Resource Pooling: The Case for In-Network Resource Sharing

Paper Title: Revisiting Resource Pooling: The Case for In-Network Resource Sharing

Authors: Ioannis Psaras, Lorenzo Saino, George Pavlou (University College London)

Presenter: Ioannis Psaras

Paper Link:

Resource pooling principle is leveraged to manage shared resources in networks. The main goal is to maintain stability and guarantee fairness. TCP effectively deal with uncertainty by suppressing demand and moving traffic as fast as the path's slowest link. The approach taken in this paper is to push as much traffic in the network, once we hit a bottleneck then we store temporary in routers caches and detour accordingly. Note, in-network storage *caches* are not used for as temporary storage for the most popular content, instead, it is used to store incoming content in temporarily. The assumptions are: 1) contents have name, 2) clients send network-layer contents. In this approach, clients regulate traffic that is pushed in the network, instead of senders. Fairness and stability is achieved in three phases: 1) push data phase, 2) cache & detour phase, and 3) back-pressure phase. Evaluation shows that there is high availability of detours in real typologies.

Q: In the table that shows the available detour paths in real typologies, 2 hops detour availability means that there is no 1 hop but there 2 hop?
A: Yes
Q(  Brighten Godfrey UIUC) Do you think putting storage at switches in DataCenter and use them in the way you suggested would yield a better flow completion time ?
A: Yes, it makes sense.  The approach could fit to datacenter.