Thursday, November 21, 2013
HotNets '13: Tiny Packet Programs for low-latency network control and monitoring
With efforts such as OpenFlow, network operators have been able to exercise a great deal of control and programmability in the control plane, but the data plane has been a good deal less flexible. Tasks such as congestion control are better handled in the hardware of the data plane, but it takes years to develop improvements here. Similarly, control plane tasks are too slow to satisfy some monitoring needs and lack context in debugging.
Any programmable protocol for the data plane is bound by stringent performance needs and would ideally be capable of handling relatively complex tasks, such as control protocols (e.g., RCP).
This paper proposes a system by which end-hosts can write a set of very simple instructions and embed it in a packet that is executed on the data plane. These instructions, known as a Tiny Packet Programs, have access to all switch memory in hardware. A TPP can load statistics from the switch, resulting in the data being embedded in the exiting packet. Once read, another end-host can use this data to write a new TPP and affect network policy.
Q: What about malicious hosts inserting TPP's in their packets
A: All TPP's can be stripped from packets originating from untrusted sources.
Q: Is this comparable to dedicated RCP hardware.
A: They are not comparable. TPP systems may have more information. They do not have to send TPP with each packet.
Q: You mention the naming of certain variables to fetch data from the switch's memory, but ASICS do not have uniform memory layouts.
A: For simplication in the talk devices were assumed to be homogenous, but this is not true in reality. Hopefully vendors will give data sheets that allows creation of a memory map. Ultimately some standardization would be helpful but not necessary.
HotNets '13: Full Duplex Backscatter
HotSDN '13: Answering Why-Not Queries in Software-Defined Networks with Negative Provenance
Presenter: Yang Wu
Authors: Yang Wu, Andreas Haeberlen, Wenchao Zhou, Boon Thau Loo
Network debuggers like ndb work by generating a backtrace — a causal chain of events from an observed problem back to a set of root cases. But to generate a backtrace, one needs an observed problem to start from. This means that network debuggers cannot be used to detect and explain questions such as "why is the HTTP server not receiving traffic?"
This paper proposes a methodology for answering these "why-not queries" using negative provenance. Provenance is a concept from databases; it models causal relationships between inputs and outputs. Provenance is often represented as a DAG that can be calculated in a straightforward way from programs written in languages such as NDLog:
PacketSent :- FlowEntry, PacketReceived
This paper proposes an extended model of provenance that includes negative information, and also develops techniques for doing counter-factual reasoning on these representations. To provide programmers with simpler explanations, it also presents algorithms for compressing provenance graphs — a 90% reduction in size on average, and typically fewer than 20 nodes. A prototype implementation has been developed using mininet.
This work is significant because it shows how to extend network debuggers to explain negative questions in addition to positive ones.
Q: Are there limits on the kinds of queries you can express using negative provenance? For example, why is the traffic not being load balanced?
A: Our current focus is on debugging logical properties and not quantitative properties.
Q: In how many of these examples would a forward trace catch the bugs as opposed to a backward trace?
A: In many situations you don't know which trace to issue and where to start. This is especially true in complex and dynamic environments.
Q: This sounds related to some older work from the knowledge plane world: something goes wrong, how do you figure out what happened? Well, someone has to start by identifying a problem. In network like the Internet, you're not going to have global visibility. There's a big scaling problem, probably a machine learning problem, etc.
A: We would be interested in understanding that work better.
Q: It seems like why-not queries could be understood in terms of classic notions of safety and liveness. For example, given the liveness property "if HTTP traffic is arriving at the ingress to the network, then it should eventually be delivered to a server," your system generates conditions that can be readily checked (using specific assumptions about the domain). Have you thought about modeling things this way to get a handle on the kinds of properties you can check?
A: Not yet. Regarding which properties we can check, as stated previously, our current focus is on logical properties and not quantitative properties.
HotNets '13: Applying Operating System Principles to SDN Controller Design
Presented by: Matthew Monaco
Authors: Matthew Monaco, Oliver Michel, Eric Keller
SDN controller platforms are often compared to operating systems, but existing controllers are more like kernels. This means that programmers must re-implement common functionality such as event handlers, timers, etc. from scratch in each new application.
Yanc (Yet Another Network Controller) is a new SDN controller platform based on classic Unix principles. It follows the "everything is a file" philosophy, which leads naturally to a simple and lightweight interface for accessing network hardware, and enables re-using standard off-the-shelf utilities in network control programs. As an example, the directory structure for a single OpenFlow switches is organized as follows:
/sw1
|-- counters/
|-- flows/
|-- ports/
|-- actions
|-- capabilities
|-- id
+-- num_buffers
This work is important because it represents a serious attempt to deliver on the vision of a "network operating system" and it proposes a fresh SDN controller architecture that makes it possible to put existing operating system abstractions to work. For example, programmers can re-use existing Linux abstractions such as:
- inotify for event processing
- file permissions and access control lists for security
- namespaces and cgroups for performance isolation
- distributed file systems for simple forms of state replication and coordination
The Yanc prototype is based on File System in Userspace (FUSE), a C++ OpenFlow driver, a Python discovery module, and a shell script to push rules. Future work includes defining new drivers for other back-ends such as Snort, developing richer operators for composing network programs, and further investigating issues related to distributed control.
Q: What about performance? The POSIX filesystem imposes all kinds of semantics that might limit parallelism. Does the FUSE implementation serialize a list of creations?
A: No, many file system operations can be implemented in parallel or asynchronously. Moreover, even if they were sequential, the latency of a packet_in "miss" far exceeds the latency of these file system operations anyway.
Q: Can distributed file systems be used to coordinate SDN controllers?
A: Perhaps! We are exploring the use of distributed file systems to implement functionality such as distributing locks for concurrency control, etc.
HotNets '13: Corybantic: Towards Modular Composition of SDN Control Programs
Authors: Jeff Mogul, Alvin AuYoung, Sujata Banerjee, Jeongkeun "JK" Lee, Jayaram Mudigond, Lucian Popa, Puneet Sharma, Yoshio Turner
In many other area of computing, programs are expressed using modular programming abstractions. It would be nice to be able to write network control programs this way too, but different programs often impose conflicting requirements on the network. For example, in a datacenter network, a program trying to minimize bandwidth usage across a core switch may wish to group tenant VMs on the same rack, while a program trying to minimize the effects of link failures may wish to distribute VMs across different racks.
Corybantic is a system for enabling modular composition of network control programs. This work is important because it represents the first attempt to address the general problem of modular composition of network control programs with conflicting requirements. Corybantic defines an API that allows controllers to make proposals for how network resources should be used and define a function for assigning value to other proposals (in a universal "currency"). The Corybantic system collects the proposals and their evaluation by each controller and selects the overall configuration that maximizes value. A prototype system has been implemented in Python and evaluated on the bandwidth/fault-tolerance example in a tree topology, comparing the value computed by Corybantic against the optimal value as computed offline.
Q: Each module is responsible for formulating a complete proposal, rather than a partial proposal. Is that correct?
A: That's correct in what we've actually implemented so far.
Q: You only evaluate proposals that someone makes. What about proposals that nobody makes but that would serve everyone better?
A: We're currently thinking about this. For example, we could add another module that watches what proposals are being made, and inserts its own proposals that would be better than any of the individual modules have been making.
Q: Are the programmers all from the same organization and agree with each other or have you considered adversarial models?
A: The assumption for now is that the programmers are not adversarial.
Q: The theory community has lots of ways for dealing with more complicated constraint problems. What are the right abstractions for expressing and resolving constraints in networking?
A: We think many compositions problems can be naturally expressed in terms of a simple currency instead of more complicated weights, because prices can be grounded in some economic reality, whereas weights are an abstraction of reality.
Q: How do you model things like SLAs as constraints? How do you even know they're satisfiable?
A: There seem to be very few real constraints. Most things can be assigned a value -- for example, one can violate an SLA and pay a penalty. For the true hard constraints (for example, don't burn down the data center, or don't allow traffic from certain users to flow across certain paths) the system allows each module to check proposals against its local set of constraints, and effectively veto constraint-violating proposals. The tricky part is to find ways to generate proposals that don't violate constraints. I don't think we have a general solution to that, yet.
Q: What about the dynamic behavior of the system? Suppose a link goes down in a fault-tolerant program? Who decides whether to batch recompute or just incrementally adjust? To put it another way, does Corybantic introduce more dynamism?
A: Corybantic makes it easier to develop a controller that supports more dynamism. (More dynamism is the goal!)
Acknowledgments: Jeff Mogul and Alvin AuYoung added some clarification on the Q&A.
Tuesday, November 19, 2013
Layer 9 is gearing up for HotNets XII
The PC chairs have asked us to organize volunteers to write brief blog posts summarizing the presented talks and discussion, for posterity and for those who can't attend in person. The summaries will be posted right here.
If you are able to contribute one session of blog coverage, we'd be very grateful!
Please sign up at the Doodle poll before 8 p.m. ET tomorrow, November 20.
For reference, the HotNets program is here: http://conferences.
We will send out an email tomorrow evening giving everybody a session. If more than one person claims the same session, that's great and you can split up the papers between you. There is no formal conflict-of-interest policy for Layer9.org, but please use common sense in signing up for a session that you can summarize fairly.
We're not aiming for publication quality on the summaries—essentially the quicker posted, the better. Five to six sentences about the most important points of the paper, plus an outline of the Q-and-A, is plenty!
Please sign up for an account on the blog, if you do not have one already. Workshop attendees should have received an invitation from Blogger to make an account that will let you post.
Saturday, September 7, 2013
SIGCOMM'13: Expressive Privacy Control with Pseudonyms
Thomas Anderson, Arvind Krishnamurthy, David Wetherall
Authors have designed a cross-layer architecture that provides users with a pseudonym abstraction. Pseudonym represents a set of activities that the user is fine with linking. Pseudonym gives the illusion of a single machine. They are able to provide pseudonyms without modification to the browser, operating system, or network. But it is to be noted that IP address separation across pseudonyms only works when the destination server is using IPv6 addresses; however, cookie separation works even with IPv4 servers.
The number of pseudonyms supported by the system is limited by the number of IP addresses we can assign concurrently to a network interface without performance degradation. For example, the Linux operating system enforces a configurable default limit of 4096 addresses. Each privacy policy results in a different number of generated pseudonyms.
Thus, this paper presents an abstraction called a pseudonym, where each device and therefore users are able to control and use many, indistinguishable identities. The pseudonym abstraction gives users
control over which activities can be linked at remote services and which cannot. The authors have designed a cross-layer architecture that exploits the ample IPv6 address space and provides application layer mechanisms for management. The given design provides the ability for users to choose expressive policies for controlling the privacy/functionality tradeoff on the web. Thus, proposed prototype system consists of a browser extension and a gateway proxy.
Sunday, September 1, 2013
SIGCOMM'13 : Mosaic: Quantifying Privacy Leakage in Mobile Networks
Friday, August 16, 2013
SIGCOMM13: SplitX: High-Performance Private Analytics
Q: What are the long-term incentives in using this system?
A: SplitX is highly relevant in the current scenario where users are increasingly concerned about privacy.
A: Splitting involves XOR operation only. Since XOR is extremely efficient, the time required for splitting is negligible.
Thursday, August 15, 2013
SIGCOMM2013: Integrating Microsecond Circuit Switching into the Data Center
SIGCOMM2013: Got Loss? Get zOVN!
Co-authors: Robert Birke, Gilles Cressier, Cyriel Minkenberg and Mitch Gusat
This paper does loss identification and characterization in virtualized data center networks
This paper quantifies the losses for several common combinations of hypervisors and virtual switches, and shows their detrimental effects on application performance. The authors propose a zero-loss Overlay Virtual Network (zOVN) designed to reduce the query and flow completion time of latency-sensitive datacenter applications. They describe its architecture and detail the design of its key component, the zVALE lossless virtual switch.
As part of the evaluation, they implemented a zOVN prototype and benchmark it with Partition-Aggregate in two testbeds, achieving an up to 15-fold reduction of the mean completion time with three widespread TCP versions. For larger-scale validation and deeper introspection into zOVN, they developed an OMNeT++ model for accurate cross layer simulations of a virtualized datacenter.
===================Q/A========================================
Q: Whenever the queue is full, you put the hypervisor to sleep? How do you deal with dead lock situations?
A: Multiple threads.
Q: How many VMs per CPU do you use?
A: One VM at the sender and one at the receiver.
SIGCOMM2013: zUpdate: Updating Data Center Networks with Zero Loss
Network updates(e.g, switch updates, VM migration) in Data Centers can be painful and lead to possible disruptions, because of transient link load spikes and congestion. The goal of this paper is to do congestion free/low latency network updates to ensure that services are not disrupted. This paper introduces zUpdate, which aims to compute a transition free update plan and executes it in a two-phase commit.
Evaluation consists of tests on a testbed of 22-switches using Openflow and compare zUpdate with other solutions (zUpdate-onestep, ECMP-onestep, ECMP-planned). For larger topologies, they use trace driven simulations on a production-level data center topology.
====================Q/A=========================
A: In such a situation we can not guarantee a congestion free transition, instead we recommend network operators to do updates in off-peak times.
Q: Can you guarantee you can find a congestion free update, is your LP formulation optimal?
In certain scenarios, we can only guarantee best-effort solution.
SIGCOMM2013: pFabric: Minimal Near-Optimal Datacenter Transport
SIGCOMM13: Towards Efficient Traffic-analysis Resistant Anonymity Networks
Authors: Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel, Hitesh Ballani, Paul Francis
The key feature of Aqua is the use of distinct traffic anonymization techniques at the core and the edges. At the core, payloads are split and sent over multiple paths to reduce the peak payload rate. At the edges, clients with similar traffic patterns are grouped together and forced to transmit at similar rates to realize k-anonymity.
Performance of Aqua was compared with other systems such as constant-rate systems, peer-to-peer systems and broadcast systems. While other systems had more than 80% overhead, Aqua provided bandwidth efficiency with less than 30% overhead. Throttling at the edges in Aqua was only 20%, much lower than 50-80% throttling observed in peer-to-peer and broadcast systems.
Q: Why would you run BitTorrent on top of Tor? Tor is too slow.
A : Tor is slow because there are very few servers hosting the service. We expect to have providers hosting Aqua services. If users are willing to pay for it, we could have a large number of hosts offering better service.
SIGCOMM2013: BigStation: Enable Scalable Real-time Signal Processing in Large MU-MIMO Systems
Co-authors: Xiaoxiao Li, Kun Tan, Hongyi Yao, Fang Ji, Wenjun Hu, Jiansong Zhang, and Yongguang Zhang
Today we see more demand for wireless capacity. We would like to engineer the next wireless network to match the capacity of existing wired networks. However, we can only increase the spectrum so much and reusing the spectrum increases complexity.
This work significantly increases wireless capacity using spatial multiplexing with many antennas. To handle so many antennas, computation is parallelized with many simple processing units. This work establishes a distributed processing pipeline, which exploits data parallelism across servers at each processing stage.
This work was prototyped using a Sora MIMO Kit. They found that with overprovisioned AP antennas, the peak transmission rate scales linearly with the number of antennas, with 9 antennas demonstrating a scaling of 6.8x over a single antenna.
Q: If you had to guess what the scaling limitation is, what do you think the fundamental boundary is? Is it infinitely scalable?
A: My estimation is that we can probably handle about 100 antennas.
Q: Do you think we can continue such an improvement up to 40 antennas in production?
A: There are still critical challenges that we haven't solved yet. These challenges will need to be worked out before this will be suitable for industry.
Q: You talk about the long tail of processing, is there packet loss?
A: Yes, when there is a high data rate, we do see packet loss, but I think it could be reduced.
Q: For production, you really need to design the interconnect to be synchronous, instead of using ethernet.
A: I don't know if this would actually work - we need to investigate more.
SIGCOMM2013: Bringing Cross-Layer MIMO to Today's Wireless LANs
Co-authors: Diego Cifuentes, Shyamnath Gollakota, and Dina Katabi
Major recent advances in cross-layer MIMO don't work on today's Wi-Fi cards, because chip manufacturers hesitate to make investments in new hardware that hasn't been fully tested on real networks.
OpenRF brings MIMO techniques to today's Wi-Fi cards. OpenRF's data plane needs to be able to apply PHY techniques to commodity cards and its control plane needs to self-configure to dynamism in the network. OpenRF handles these challenges by using two transmit queues and by handling some scheduling locally.
A: This is a lot more natural for cellular networks because they already have some notion of scheduling.
A: The location of Alice and Bob really doesn't matter. These access points can track the channels as they change at the clients.
A: Sometimes you might care about different flows (for example long-lived flows but not bursty traffic).
A: Our techniques work even better with more antennas (ex: 3), so this would work.
A: The centralized controller is important for dealing with interference. You need to coordinate interference between access points.
SIGCOMM2013: Full Duplex Radios
Co-authors: Emily McMilin and Sachin Katti
Achieving full duplex radio is difficult because self-interference is a hundred billion times stronger than the received signal. Canceling the transmitted signal is difficult because the signal transmitted is actually quite different from what you think you are transmitting, due to noise and non-linear affects called harmonics. You need 110 dB (at least 70 dB of which is analog) in order to eliminate the transmitter noise.
This paper presents the first full duplex radio, which uses a new cancellation technique that eliminates all self-interference. Their approach is a hybrid design, with both analog and digital components.
Using commodity radios, this work is able to reduce self interference to within 1 dB of the noise floor. This approach significantly outperforms previous works in reducing the self-interference residue. This design achieves a 1.97x increase in throughput over non-duplex designs, which is very close to the optimal 2x throughput increase.
Q: How does this compare with a previous work from Waterloo?
A: They acheived 50 dB of cancellation, but they didn't do any non-linear cancellation.
Q: You are suggesting that your own work from SIGCOMM 2011 does worse than half-duplex?
A: That wasn't our work.
Q: Would this scale?
A: I still believe this would scale.
SIGCOMM2013: An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance
Co-authors: Junxian Huang, Feng Qian, Yihua Guo, Yuanyuan Zhou, Qiang Xu, Subhabrata Sen, and Oliver Spatscheck
LTE is a fairly new technology, so little is known about the bandwidth, latency, and RTTs that its users experience in commercial networks. Information about the properties of LTE in commercial networks would enable transport layers and applications that are more LTE-friendly.
In this work, they analyzed an anonymized packet header trace from a US metropolitan area, which included 3 TB of LTE traffic. They observed undesired slow starts in 12% of large flows. They created an algorithm to estimate available bandwidth and utilization from the trace.
They found the median bandwidth utilization to be 20%, and that for 71% of the large flows, the bandwidth utilization is below 50%. They also found high LTE bandwidth variability, and that TCP's performance was degraded by a limited receive window. They suggest that these problems could be addressed by updating RTT estimates in the transport layer and reading data from TCP buffers more quickly in the application layer.
Q: Which flavors of TCP were you looking at?
A: Cubic.
Q: Some applications intentionally use small TCP windows, did you investigate that?
A: The application may be doing some kind of rate limiting, so that is a possible explanation for the window. However, most applications in the trace only opened one TCP connection.
Q: To what degree do your observations depend on the particular LTE network?
A: Our observations are limited by the trace that we have. We did local experiments on two commercial networks and observed similar behavior. Our study is limited to LTE networks in the US.
Q: You seem to put some blame on the carrier for having large buffers - can this problem be fixed in practice? What should they do?
A: Yes that is a problem (buffer bloat). The loss rate in these networks is already low, so I am not sure if eliminating large buffers is the only solution.
SIGCOMM13: ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing
- provide minimum bandwidth guarantees in Clouds
- work-conserving allocation
- be practical
More details can be found: http://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p351.pdf
SIGCOMM13: Developing a Predictive Model of Quality of Experience for Internet Video
- complex engagement-to-metric relationships and complex metric interdependencies
- identify confounding factors
- incorporate confounding factors
- Cast complex relationships as a machine learning problem.
- Design different test to examine potential factors and then identify key confounding factors.
- Two methods are proposed to refine the model to incorporate confounding factors: add confounding factors as a feature and split the data by confounding factors. The speaker shows that split is better than feature, allowing the model to achieve 70% accuracy.