Layer 9

Thursday, November 21, 2013

HotNets '13: Tiny Packet Programs for low-latency network control and monitoring

Authors: Vimalkumar Jeyakumar (Stanford University), Mohammad Alizadeh (Insieme Networks), Changhoon Kim (Windows Azure), David Mazieres (Stanford University).

With efforts such as OpenFlow, network operators have been able to exercise a great deal of control and programmability in the control plane, but the data plane has been a good deal less flexible. Tasks such as congestion control are better handled in the hardware of the data plane, but it takes years to develop improvements here. Similarly, control plane tasks are too slow to satisfy some monitoring needs and lack context in debugging.

Any programmable protocol for the data plane is bound by stringent performance needs and would ideally be capable of handling relatively complex tasks, such as control protocols (e.g., RCP).

This paper proposes a system by which end-hosts can write a set of very simple instructions and embed it in a packet that is executed on the data plane. These instructions, known as a Tiny Packet Programs, have access to all switch memory in hardware. A TPP can load statistics from the switch, resulting in the data being embedded in the exiting packet. Once read, another end-host can use this data to write a new TPP and affect network policy.

Q: What about malicious hosts inserting TPP's in their packets
A: All TPP's can be stripped from packets originating from untrusted sources.

Q: Is this comparable to dedicated RCP hardware.
A: They are not comparable. TPP systems may have more information. They do not have to send TPP with each packet.

Q: You mention the naming of certain variables to fetch data from the switch's memory, but ASICS do not have uniform memory layouts.
A: For simplication in the talk devices were assumed to be homogenous, but this is not true in reality. Hopefully vendors will give data sheets that allows creation of a memory map. Ultimately some standardization would be helpful but not necessary.

HotNets '13: Full Duplex Backscatter

Authors: Dinesh Bharadia, Kiran Raj Joshi, Sachin Katti

Can we add powerful imaging capabilities to the radios in our gadgets (e.g., smartphones), such as weapon detection, indoor imaging, motion tracking, etc? Radio transmissions produce backscatter which reflects off of objects in the environment around the transmission. This backscatter contains information about the reflectors in the environment by determining the amplitude, delay, and AoA of each reflection.

The key challenges with respect to the targeted radios are the limited available bandwidth (e.g., WiFi at 80 MHz) and dynamic range (e.g., ADCs with only 12 bits of resolution). Limited bandwidth (and therefore lower sampling rate) affects the accuracy of the estimation of the delay (and therefore distance) of the reflectors in the environment. Dynamic range affects how well the system can detect both nearby reflectors (with strong reflections) and far away reflectors (with weak reflections); the strong reflections may completely drown out the weak reflections within the digital representation of the signal.

In order to handle limited bandwidth, whereby multiple reflections can more easily appear within the same sampling interval, angle of arrival can be used to actually determine that there are two (or more) such reflections assuming that the reflectors are spatially diverse.

With respect to limited dynamic range, it is possible to perform successive estimation and cancelation of components in decreasing order of signal power. However, this cancelation must be performed prior to the ADC (and therefore in the analog signal itself) in order to allow the dynamic range to shift with respect to the signal power in the subsequently strongest signals.

In order to evaluate the cancelation method, a WARP radio is used with the transmit port split into three separate wires of different lengths (which correspond to different delays). MATLAB simulations are used to determine the overall accuracy of the backscatter approach, using up to 6 total reflectors (and up to 3 within the same sampling interval). The simulations support this approach's feasibility for a variety of imaging applications.

--------------------

Q: Where the antennas the same form factor as that in most cellphones?

A: While the form factor is the same, this technique (for AoA) required 4 antennas in the evaluation. In addition, the spacing requires higher frequencies (5 GHz, 60 GHz) to meet the form factor for smartphones due to antenna spacing requirements.

Q/A: Previous work in RF-self interference cancellation wanted to cancel all reflections, where as in this work we are interested in successively canceling each individual reflection in order to support low-power (far) reflections while using ADCs with limited accuracy.

Q: How many reflections can you detect?

A: We used up to 6 reflections, with 3 in the same sampling interval. However, we have not tested its performance as the number of reflections increases.

Q: If there are several mobile phones that are nearby, can they cooperate to perform some imaging?

A: Cooperation between devices could certainly help.

HotSDN '13: Answering Why-Not Queries in Software-Defined Networks with Negative Provenance

Presenter: Yang Wu

Authors: Yang Wu, Andreas Haeberlen, Wenchao Zhou, Boon Thau Loo

Network debuggers like ndb work by generating a backtrace — a causal chain of events from an observed problem back to a set of root cases. But to generate a backtrace, one needs an observed problem to start from. This means that network debuggers cannot be used to detect and explain questions such as "why is the HTTP server not receiving traffic?"

This paper proposes a methodology for answering these "why-not queries" using negative provenance. Provenance is a concept from databases; it models causal relationships between inputs and outputs. Provenance is often represented as a DAG that can be calculated in a straightforward way from programs written in languages such as NDLog:

  PacketSent :- FlowEntry, PacketReceived

This paper proposes an extended model of provenance that includes negative information, and also develops techniques for doing counter-factual reasoning on these representations. To provide programmers with simpler explanations, it also presents algorithms for compressing provenance graphs — a 90% reduction in size on average, and typically fewer than 20 nodes. A prototype implementation has been developed using mininet.

This work is significant because it shows how to extend network debuggers to explain negative questions in addition to positive ones.

Q: Are there limits on the kinds of queries you can express using negative provenance? For example, why is the traffic not being load balanced?

A: Our current focus is on debugging logical properties and not quantitative properties.

Q: In how many of these examples would a forward trace catch the bugs as opposed to a backward trace?

A: In many situations you don't know which trace to issue and where to start. This is especially true in complex and dynamic environments.

Q: This sounds related to some older work from the knowledge plane world: something goes wrong, how do you figure out what happened? Well, someone has to start by identifying a problem. In network like the Internet, you're not going to have global visibility. There's a big scaling problem, probably a machine learning problem, etc.

A: We would be interested in understanding that work better.

Q: It seems like why-not queries could be understood in terms of classic notions of safety and liveness. For example, given the liveness property "if HTTP traffic is arriving at the ingress to the network, then it should eventually be delivered to a server," your system generates conditions that can be readily checked (using specific assumptions about the domain). Have you thought about modeling things this way to get a handle on the kinds of properties you can check?

A: Not yet. Regarding which properties we can check, as stated previously, our current focus is on logical properties and not quantitative properties.

HotNets '13: Applying Operating System Principles to SDN Controller Design

Presented by: Matthew Monaco

Authors: Matthew Monaco, Oliver Michel, Eric Keller

SDN controller platforms are often compared to operating systems, but existing controllers are more like kernels. This means that programmers must re-implement common functionality such as event handlers, timers, etc. from scratch in each new application.

Yanc (Yet Another Network Controller) is a new SDN controller platform based on classic Unix principles. It follows the "everything is a file" philosophy, which leads naturally to a simple and lightweight interface for accessing network hardware, and enables re-using standard off-the-shelf utilities in network control programs. As an example, the directory structure for a single OpenFlow switches is organized as follows:

   /sw1
    |-- counters/
    |-- flows/
    |-- ports/
    |-- actions
    |-- capabilities
    |-- id
    +-- num_buffers

This work is important because it represents a serious attempt to deliver on the vision of a "network operating system" and it proposes a fresh SDN controller architecture that makes it possible to put existing operating system abstractions to work. For example, programmers can re-use existing Linux abstractions such as:

inotify for event processing
file permissions and access control lists for security
namespaces and cgroups for performance isolation
distributed file systems for simple forms of state replication and coordination

rather than having to implement their own from scratch.

The Yanc prototype is based on File System in Userspace (FUSE), a C++ OpenFlow driver, a Python discovery module, and a shell script to push rules. Future work includes defining new drivers for other back-ends such as Snort, developing richer operators for composing network programs, and further investigating issues related to distributed control.

Q: What about performance? The POSIX filesystem imposes all kinds of semantics that might limit parallelism. Does the FUSE implementation serialize a list of creations?

A: No, many file system operations can be implemented in parallel or asynchronously. Moreover, even if they were sequential, the latency of a packet_in "miss" far exceeds the latency of these file system operations anyway.

Q: Can distributed file systems be used to coordinate SDN controllers?

A: Perhaps! We are exploring the use of distributed file systems to implement functionality such as distributing locks for concurrency control, etc.

HotNets '13: Corybantic: Towards Modular Composition of SDN Control Programs

Presented by: Alvin AuYoung

Authors: Jeff Mogul, Alvin AuYoung, Sujata Banerjee, Jeongkeun "JK" Lee, Jayaram Mudigond, Lucian Popa, Puneet Sharma, Yoshio Turner

In many other area of computing, programs are expressed using modular programming abstractions. It would be nice to be able to write network control programs this way too, but different programs often impose conflicting requirements on the network. For example, in a datacenter network, a program trying to minimize bandwidth usage across a core switch may wish to group tenant VMs on the same rack, while a program trying to minimize the effects of link failures may wish to distribute VMs across different racks.

Corybantic is a system for enabling modular composition of network control programs. This work is important because it represents the first attempt to address the general problem of modular composition of network control programs with conflicting requirements. Corybantic defines an API that allows controllers to make proposals for how network resources should be used and define a function for assigning value to other proposals (in a universal "currency"). The Corybantic system collects the proposals and their evaluation by each controller and selects the overall configuration that maximizes value. A prototype system has been implemented in Python and evaluated on the bandwidth/fault-tolerance example in a tree topology, comparing the value computed by Corybantic against the optimal value as computed offline.

Q: Each module is responsible for formulating a complete proposal, rather than a partial proposal. Is that correct?

A: That's correct in what we've actually implemented so far.

Q: You only evaluate proposals that someone makes. What about proposals that nobody makes but that would serve everyone better?

A: We're currently thinking about this. For example, we could add another module that watches what proposals are being made, and inserts its own proposals that would be better than any of the individual modules have been making.

Q: Are the programmers all from the same organization and agree with each other or have you considered adversarial models?

A: The assumption for now is that the programmers are not adversarial.

Q: The theory community has lots of ways for dealing with more complicated constraint problems. What are the right abstractions for expressing and resolving constraints in networking?

A: We think many compositions problems can be naturally expressed in terms of a simple currency instead of more complicated weights, because prices can be grounded in some economic reality, whereas weights are an abstraction of reality.

Q: How do you model things like SLAs as constraints? How do you even know they're satisfiable?

A: There seem to be very few real constraints. Most things can be assigned a value -- for example, one can violate an SLA and pay a penalty. For the true hard constraints (for example, don't burn down the data center, or don't allow traffic from certain users to flow across certain paths) the system allows each module to check proposals against its local set of constraints, and effectively veto constraint-violating proposals. The tricky part is to find ways to generate proposals that don't violate constraints. I don't think we have a general solution to that, yet.

Q: What about the dynamic behavior of the system? Suppose a link goes down in a fault-tolerant program? Who decides whether to batch recompute or just incrementally adjust? To put it another way, does Corybantic introduce more dynamism?

A: Corybantic makes it easier to develop a controller that supports more dynamism. (More dynamism is the goal!)

Acknowledgments: Jeff Mogul and Alvin AuYoung added some clarification on the Q&A.

Tuesday, November 19, 2013

Layer 9 is gearing up for HotNets XII

HotNets-XII (which is neither a wrestling event nor a version of the X Window System, despite initial appearances) is starting this coming Thursday at the University of Maryland, and Layer 9 will be here to chronicle it.

The PC chairs have asked us to organize volunteers to write brief blog posts summarizing the presented talks and discussion, for posterity and for those who can't attend in person. The summaries will be posted right here.

If you are able to contribute one session of blog coverage, we'd be very grateful!

Please sign up at the Doodle poll before 8 p.m. ET tomorrow, November 20.

For reference, the HotNets program is here: http://conferences.sigcomm.org/hotnets/2013/program.shtml

We will send out an email tomorrow evening giving everybody a session. If more than one person claims the same session, that's great and you can split up the papers between you. There is no formal conflict-of-interest policy for Layer9.org, but please use common sense in signing up for a session that you can summarize fairly.

We're not aiming for publication quality on the summaries—essentially the quicker posted, the better. Five to six sentences about the most important points of the paper, plus an outline of the Q-and-A, is plenty!

Please sign up for an account on the blog, if you do not have one already. Workshop attendees should have received an invitation from Blogger to make an account that will let you post.

Saturday, September 7, 2013

SIGCOMM'13: Expressive Privacy Control with Pseudonyms

Authors: Seungyeop Han, Vincent Liu, Qifan Pu, Simon Peter,
Thomas Anderson, Arvind Krishnamurthy, David Wetherall

Authors have designed a cross-layer architecture that provides users with a pseudonym abstraction. Pseudonym represents a set of activities that the user is fine with linking. Pseudonym gives the illusion of a single machine. They are able to provide pseudonyms without modification to the browser, operating system, or network. But it is to be noted that IP address separation across pseudonyms only works when the destination server is using IPv6 addresses; however, cookie separation works even with IPv4 servers.

The number of pseudonyms supported by the system is limited by the number of IP addresses we can assign concurrently to a network interface without performance degradation. For example, the Linux operating system enforces a configurable default limit of 4096 addresses. Each privacy policy results in a different number of generated pseudonyms.

Thus, this paper presents an abstraction called a pseudonym, where each device and therefore users are able to control and use many, indistinguishable identities. The pseudonym abstraction gives users
control over which activities can be linked at remote services and which cannot. The authors have designed a cross-layer architecture that exploits the ample IPv6 address space and provides application layer mechanisms for management. The given design provides the ability for users to choose expressive policies for controlling the privacy/functionality tradeoff on the web. Thus, proposed prototype system consists of a browser extension and a gateway proxy.

Sunday, September 1, 2013

SIGCOMM'13 : Mosaic: Quantifying Privacy Leakage in Mobile Networks

Authors: Ning Xia, Han Hee Song, Yong Liao, Marios Iliofotou, Antonio Nucci, Zhi-Li Zhang and Aleksandar Kuzmanovic.

For a growing number of users, online social networking (OSN) sites such as Facebook and Twitter have become an integral part of their online activities. This paper calls attention to the privacy leakage in mobile network data. This paper also calls attention to an important aspect of the privacy leakage problem: namely, the potential danger to user privacy posed by a third party, not simply by crawling data directly from OSN sites, but by gathering digital footprints left by users in cyberspace. GPS and other location information in mobile cellular data make it possible to tie users’ cyber activities to their presence in the physical world. The conﬂuence of smart phones and OSNs renders the ability to glean personal information from mobile data a far more potent threat to user privacy than attacks on each individual service. These pose a serious threat to user privacy. This happens because of some shortcomings of certain OSN design, as well as by the fundamental limitations of the current Web and Internet from a user privacy perspective, such as cookie mechanism used by the stateless HTTP protocol.

They refer to this problem as constructing a MOSAIC of a user from their online digital footprints, and correspondingly refer to the gathered footprint pieces as TESSERAE.

As a solution they have develop the Tessellation methodology. Through Tessellation, they show how user identity information such as OSN IDs and device tracking cookies can be extracted from the traffic. Furthermore, they describe how the remaining pieces of traffic with no identity leakages can be attributed to the known user identities.

They claimed that Tessellation can attribute 50% of traffic to the owners with only 5% error. Optionally, the coverage can be increased to 80%, with just a 2% increase in the error rate. Using this methodology, they were able to create mosaics for more than 16,000 users and classify their personal information into 59 categories including user demographics, locations, affiliations, social activities, interests, etc. And as a solution they suggest possible countermeasures to safeguard against the alarming leakage of private information.

====================== Q/A====================

Q. From where do they obtain OSN User Identiﬁers and Information?

A: Many OSN sites due to their weak designing “leak” their user identiﬁers allows Tessellation to attribute traffic to real users. HTTP headers are used to obtain URL, Cookies and payload information to get user login and session key information.

Q. How to get the value of coverage? What are the types of coverage?

A: There are two types of Coverage: a) Session Level Coverage and b) User Level Coverage. Session-level coverage is the number of sessions that are given a prediction (i.e., sum of sessions in all Ts), divided by the total number of sessions. User-level coverage is the number of ground truth users for whom Tessellation identiﬁed all or a subset of their sessions divided by the total number of ground truth users.

Friday, August 16, 2013

SIGCOMM13: SplitX: High-Performance Private Analytics

Presented by: Ruichuan Chen

Authors: Ruichuan Chen, Istemi Ekin Akkus, Paul Francis

SplitX is a high-performance private analytics system resistant to answer pollution. It is designed under the assumption that analysts and clients are potentially malicious while servers are honest.

The key factors differentiating SplitX from other analytics systems are XOR encryption and query buckets. SplitX achieves high performance in terms of bandwidth and computation by substituting cryptographic encryption with XOR operation. In order to limit answer pollution, clients are restricted to answer queries in binary format.

In the SplitX system, clients subscribe to the queries published by the analysts. Clients split their answer and send them to mixes, which add differentially private noise to the messages. Aggregators generate query results by combining the outputs of the mixes. Double-splitting is used at the mixes to guarantee privacy.

Q: What are the long-term incentives in using this system?
A: SplitX is highly relevant in the current scenario where users are increasingly concerned about privacy.

Q: SplitX uses splitting at several stages. What is the time required per splitting?
A: Splitting involves XOR operation only. Since XOR is extremely efficient, the time required for splitting is negligible.

Thursday, August 15, 2013

SIGCOMM2013: Integrating Microsecond Circuit Switching into the Data Center

Presenter: George Porter

Co-author: Richard Strong, Nathan Farrington, Alex Forencich, Pang Chen-Sun, Tajana Rosing, Yeshaiahu Fainman George Papen, Amin Vahdat

This paper designs and builds an optical circuit switching (OCS) prototype (called Mordia) which achieves a switching time of 11us. The authors then identify a set of challenges in using existing control planes for such microsecond latency switching. To address these challenges, the authors propose TMS; a control plan for fast circuit switching that uses application information and short term demand estimates to compute schedules and proactively communicates circuit assignments to communicating entities.

To achieve high utilization, the computed schedules are sent to ToRs connected to Mordia. The ToRs in turn adjust the transmission of packets into the network to match the scheduled switch reconfigurations, with complete knowledge of when bandwidth will be most available to a particular destination. Thus, both short and long flows can be offloaded into the OCS.

TMS can achieve 65% of the bandwidth of an identical link rate electronic packet switch (EPS) with circuits as short as 61us duration, and 95% of EPS performance with 300us circuits using commodity hardware.

======================Q/A=====================

Q: How do you configure queues on your system?

A: Queue classification is based on ip address.

Q: Control plane and circuit switching, have you put these two pieces in your work together?

A: We have integrated the two.

SIGCOMM2013: Got Loss? Get zOVN!

Presenter: Daniel Crisan
Co-authors: Robert Birke, Gilles Cressier, Cyriel Minkenberg and Mitch Gusat

This paper does loss identification and characterization in virtualized data center networks
This paper quantifies the losses for several common combinations of hypervisors and virtual switches, and shows their detrimental effects on application performance. The authors propose a zero-loss Overlay Virtual Network (zOVN) designed to reduce the query and flow completion time of latency-sensitive datacenter applications. They describe its architecture and detail the design of its key component, the zVALE lossless virtual switch.

As part of the evaluation, they implemented a zOVN prototype and benchmark it with Partition-Aggregate in two testbeds, achieving an up to 15-fold reduction of the mean completion time with three widespread TCP versions. For larger-scale validation and deeper introspection into zOVN, they developed an OMNeT++ model for accurate cross layer simulations of a virtualized datacenter.

===================Q/A========================================

Q: Whenever the queue is full, you put the hypervisor to sleep? How do you deal with dead lock situations?

A: Multiple threads.

Q: How many VMs per CPU do you use?

A: One VM at the sender and one at the receiver.

SIGCOMM2013: zUpdate: Updating Data Center Networks with Zero Loss

Presenter: Hongqiang Harry Liu

Co-authors: Xin Wu, Ming Zhang, Lihua Yuan, Roger Wattenhofer, David A. Maltz

Network updates(e.g, switch updates, VM migration) in Data Centers can be painful and lead to possible disruptions, because of transient link load spikes and congestion. The goal of this paper is to do congestion free/low latency network updates to ensure that services are not disrupted. This paper introduces zUpdate, which aims to compute a transition free update plan and executes it in a two-phase commit.

In zUpdate, when an operator wants to perform a DCN update, she will submit a request containing the update requirements to the update scenario translator. The latter converts the operator’s request into the formal update constraints. The zUpdate engine takes the update constraints together with the current network topology, traffic matrix, and flow rules and attempts to produce a lossless transition plan. The zUpdate plan is formulated as a linear program subject to the constraints of flow conservation and traffic delivery.

Evaluation consists of tests on a testbed of 22-switches using Openflow and compare zUpdate with other solutions (zUpdate-onestep, ECMP-onestep, ECMP-planned). For larger topologies, they use trace driven simulations on a production-level data center topology.

====================Q/A=========================

Q: If the network resources are limited, and the network update is done, how do you handle this situation?

A: In such a situation we can not guarantee a congestion free transition, instead we recommend network operators to do updates in off-peak times.

Q: Can you guarantee you can find a congestion free update, is your LP formulation optimal?

In certain scenarios, we can only guarantee best-effort solution.

SIGCOMM2013: pFabric: Minimal Near-Optimal Datacenter Transport

Presenter: Mohammad Alizadeh,

Co-authors: Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker

This paper purposes pFabric, which aims to minimize average flow completion times in data centers. pFabric achieves this by approximating SRPT scheduling in a distributed manner. With pFabric switches employ small buffers and use priority scheduling and priority dropping based on priorities in packet headers that are assigned by end-hosts. pFabric end-hosts always send at line rate except under high packet loss rate, in which case they reduce the window size and apply slow start. In pFabric, losses are only recovered with timeouts through the use of a fixed, small RTO.

This work falls in the category of protocols which aim to approximate different scheduling algorithms (e.g., SRPT, EDF). Prior works include D3 (SIGCOMM'11), PDQ (SIGCOMM'12), D2TCP (SIGCOMM'12) and L2DCT (INFOCOM'13). Protocols like D3 and PDQ introduce complexity inside the network, whereas pFabric aims to reduce this complexity. Moreover D3 and PDQ require 1RTT to obtain rate feedback before sending traffic, pFabric aims to overcome this by allowing senders to transmit at the interface line rate in the first RTT.

The evaluation was done using ns-2 simulations. They evaluate pFabric's performance across a range of scenarios using empirically derived traffic distribution and show that pFabric achieves close to ideal mean flow completion times

==========================Q/A====================================

Q: Your evaluation focuses on non-over subscribed data center topology, how would the performance change under over-subscribed topologies? And how sensitive are your results to the use of packet spraying?

A: We conduced experiments under over-subscribed topologies (the results are in the technical report cited in the paper) and we found the performance to be similar.

We compared packet-spraying with ECMP, and found that packet spraying improved performance.

Q: How do you deal with scenarios with real time traffic like VOIP, for which flow completion time is not a meaningful metric?

A: Our work does not deal with such applications.

Q: How would you deal with jumbo frames in the short buffers you use?

A: We use 1500 byte packets, which provides us with about 20 packets of buffer. With jumbo frames, we need to have more buffering, however the performance is unlikely to be significantly impacted due to the use of priority scheduling and dropping.

Q: What happens to fairness? Do you care about fairness?

A: We do not care much about the fairness, however you could achieve improved fairness at the cost of increase in completion times.

Q: How does pFabric deal with a group of flows?

A: We do not deal with such scenarios. We specifically deal with scenarios of short and long flows.

SIGCOMM13: Towards Efficient Traffic-analysis Resistant Anonymity Networks

Presented by: Stevens Le Blond

Authors: Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel, Hitesh Ballani, Paul Francis

Aqua, a k-anonymity system trumps the existing systems by providing performance guarantees in terms of low latency, high-bandwidth and resistance to traffic analysis. This is achieved by exploiting existing correlations in BitTorrent traffic.

The key feature of Aqua is the use of distinct traffic anonymization techniques at the core and the edges. At the core, payloads are split and sent over multiple paths to reduce the peak payload rate. At the edges, clients with similar traffic patterns are grouped together and forced to transmit at similar rates to realize k-anonymity.

Performance of Aqua was compared with other systems such as constant-rate systems, peer-to-peer systems and broadcast systems. While other systems had more than 80% overhead, Aqua provided bandwidth efficiency with less than 30% overhead. Throttling at the edges in Aqua was only 20%, much lower than 50-80% throttling observed in peer-to-peer and broadcast systems.

Q: Why would you run BitTorrent on top of Tor? Tor is too slow.
A : Tor is slow because there are very few servers hosting the service. We expect to have providers hosting Aqua services. If users are willing to pay for it, we could have a large number of hosts offering better service.

SIGCOMM2013: BigStation: Enable Scalable Real-time Signal Processing in Large MU-MIMO Systems

Presenter: Qing Yang
Co-authors: Xiaoxiao Li, Kun Tan, Hongyi Yao, Fang Ji, Wenjun Hu, Jiansong Zhang, and Yongguang Zhang

Today we see more demand for wireless capacity. We would like to engineer the next wireless network to match the capacity of existing wired networks. However, we can only increase the spectrum so much and reusing the spectrum increases complexity.

This work significantly increases wireless capacity using spatial multiplexing with many antennas. To handle so many antennas, computation is parallelized with many simple processing units. This work establishes a distributed processing pipeline, which exploits data parallelism across servers at each processing stage.

This work was prototyped using a Sora MIMO Kit. They found that with overprovisioned AP antennas, the peak transmission rate scales linearly with the number of antennas, with 9 antennas demonstrating a scaling of 6.8x over a single antenna.

Q: If you had to guess what the scaling limitation is, what do you think the fundamental boundary is? Is it infinitely scalable?
A: My estimation is that we can probably handle about 100 antennas.

Q: Do you think we can continue such an improvement up to 40 antennas in production?
A: There are still critical challenges that we haven't solved yet. These challenges will need to be worked out before this will be suitable for industry.

Q: You talk about the long tail of processing, is there packet loss?
A: Yes, when there is a high data rate, we do see packet loss, but I think it could be reduced.

Q: For production, you really need to design the interconnect to be synchronous, instead of using ethernet.
A: I don't know if this would actually work - we need to investigate more.

SIGCOMM2013: Bringing Cross-Layer MIMO to Today's Wireless LANs

Presenter: Swarun Kumar
Co-authors: Diego Cifuentes, Shyamnath Gollakota, and Dina Katabi

Major recent advances in cross-layer MIMO don't work on today's Wi-Fi cards, because chip manufacturers hesitate to make investments in new hardware that hasn't been fully tested on real networks.

OpenRF brings MIMO techniques to today's Wi-Fi cards. OpenRF's data plane needs to be able to apply PHY techniques to commodity cards and its control plane needs to self-configure to dynamism in the network. OpenRF handles these challenges by using two transmit queues and by handling some scheduling locally.

OpenRF was implemented on Intel 5300 Wi-Fi cards. OpenRF demonstrated an average gain of 1.6x in TCP throughput over 802.11 in a large-scale experiment, as well as clear improvements in video quality in real applications.

Q: How would you carry the same techniques to cellular networks?
A: This is a lot more natural for cellular networks because they already have some notion of scheduling.

Q: What if Alice and Bob move? How do you locate them?
A: The location of Alice and Bob really doesn't matter. These access points can track the channels as they change at the clients.

Q: You seem to use the same matching as OpenFlow for your flow tables. Wouldn't just the MAC address be sufficient? Why do you have to look at flows?
A: Sometimes you might care about different flows (for example long-lived flows but not bursty traffic).

Q: As you add more antennas you can move towards a wire abstraction. Have you thought about how this would affect your techniques?
A: Our techniques work even better with more antennas (ex: 3), so this would work.

Q: Today's mobile networks have a coordination mechanism. To what extent do we need centralized control?
A: The centralized controller is important for dealing with interference. You need to coordinate interference between access points.

SIGCOMM2013: Full Duplex Radios

Presenter: Dinesh Bharadia
Co-authors: Emily McMilin and Sachin Katti

Achieving full duplex radio is difficult because self-interference is a hundred billion times stronger than the received signal. Canceling the transmitted signal is difficult because the signal transmitted is actually quite different from what you think you are transmitting, due to noise and non-linear affects called harmonics. You need 110 dB (at least 70 dB of which is analog) in order to eliminate the transmitter noise.

This paper presents the first full duplex radio, which uses a new cancellation technique that eliminates all self-interference. Their approach is a hybrid design, with both analog and digital components.

Using commodity radios, this work is able to reduce self interference to within 1 dB of the noise floor. This approach significantly outperforms previous works in reducing the self-interference residue. This design achieves a 1.97x increase in throughput over non-duplex designs, which is very close to the optimal 2x throughput increase.

Q: How does this compare with a previous work from Waterloo?
A: They acheived 50 dB of cancellation, but they didn't do any non-linear cancellation.

Q: You are suggesting that your own work from SIGCOMM 2011 does worse than half-duplex?
A: That wasn't our work.

Q: Would this scale?
A: I still believe this would scale.

SIGCOMM2013: An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance

Presenter: Morley Mao
Co-authors: Junxian Huang, Feng Qian, Yihua Guo, Yuanyuan Zhou, Qiang Xu, Subhabrata Sen, and Oliver Spatscheck

LTE is a fairly new technology, so little is known about the bandwidth, latency, and RTTs that its users experience in commercial networks. Information about the properties of LTE in commercial networks would enable transport layers and applications that are more LTE-friendly.

In this work, they analyzed an anonymized packet header trace from a US metropolitan area, which included 3 TB of LTE traffic. They observed undesired slow starts in 12% of large flows. They created an algorithm to estimate available bandwidth and utilization from the trace.

They found the median bandwidth utilization to be 20%, and that for 71% of the large flows, the bandwidth utilization is below 50%. They also found high LTE bandwidth variability, and that TCP's performance was degraded by a limited receive window. They suggest that these problems could be addressed by updating RTT estimates in the transport layer and reading data from TCP buffers more quickly in the application layer.

Q: Which flavors of TCP were you looking at?
A: Cubic.

Q: Some applications intentionally use small TCP windows, did you investigate that?
A: The application may be doing some kind of rate limiting, so that is a possible explanation for the window. However, most applications in the trace only opened one TCP connection.

Q: To what degree do your observations depend on the particular LTE network?
A: Our observations are limited by the trace that we have. We did local experiments on two commercial networks and observed similar behavior. Our study is limited to LTE networks in the US.

Q: You seem to put some blame on the carrier for having large buffers - can this problem be fixed in practice? What should they do?
A: Yes that is a problem (buffer bloat). The loss rate in these networks is already low, so I am not sure if eliminating large buffers is the only solution.

SIGCOMM13: ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing

The paper was presented by Lucian Popa.

Other co-authors are: Praveen Yalagandula, Sujata Banerjee, Jeffrey C. Mogul, Yoshio Turner and Jose Renato Santos.

This talk was about network resource guarantees in cloud computing. The author presented ElasticSwitch, an efﬁcient and practical approach for providing bandwidth guarantees.

Goal of ElasticSwitch:

provide minimum bandwidth guarantees in Clouds

work-conserving allocation

be practical

ElasticSwitch design:

ElasticSwitch resides in the hypervisor of each host, not in the individual VMs. ElasticSwitch contains two components: guarantee partitioning and rate allocation. To guarantee partition, ElasticSwitch turns hose model into a VM-to-VM pipe guarantee. Guarantee partitioning leverages max min allocation and there are 3 goals: safety, efficiency and no starvation.

In rate allocation, ElasticSwitch uses rate limiters and increases rate between X-Y above it’s partition when there is no congestion between X and Y. Congestion is detected through dropped packets, or use ECN. The adaptive algorithm used in ElasticSwitch is from Seawall (NSDI 11).

Evaluation:

The testbed contains 100 servers and 1Gbps tree network. The first workload tested is many-to-one workload. The result shows that ElasticSwitch can provide bandwidth guarantees and achieve ideal situation. The second workload is MapReduce. It shows that in the worst case, job completion time is much shorter when using ElasticSwitch comparing with no bandwidth guarantees.

More details can be found: http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p351.pdf

SIGCOMM13: Developing a Predictive Model of Quality of Experience for Internet Video

The paper was presented by Athula Balachandran.

Other co-authors are: Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica and Hui Zhang.

The metrics to measure Internet video QoE have shifted from traditional method to new method. This paper is about developing a predictive model of Internet video QoE. The model meets two requirements: (1) it has to be tied in to observable user engagement; (2) it should be actionable to guide practical system design decisions.

Commonly used quality metrics include join time, buffering ratio, rate of buffering, rate of switching, average bitrate, etc. However, which metrics should we use for QoE are unknown. This work develops a unified and quantitative QoE model to solve the problem.

Challenge:

complex engagement-to-metric relationships and complex metric interdependencies

identify confounding factors

incorporate confounding factors

Approaches:

Cast complex relationships as a machine learning problem.

Design different test to examine potential factors and then identify key confounding factors.

Two methods are proposed to refine the model to incorporate confounding factors: add confounding factors as a feature and split the data by confounding factors. The speaker shows that split is better than feature, allowing the model to achieve 70% accuracy.

Evaluation:

The speaker shows that there is 100% improvement of average engagement comparing with baseline and 20% improvement comparing with other strategies.

Q: Is there any correlation between some confounding factors?

A: Some correlations are seen because of user behaviors.

Q: how does this approach capture real user effects in the wild?

A: The data used in the analysis is from real user data.

Q: Why not use other machine learning models?

A: Could use.

More details can be found:
http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p339.pdf