Friday, August 16, 2013

SIGCOMM13: SplitX: High-Performance Private Analytics

Presented by: Ruichuan Chen

Authors: Ruichuan Chen, Istemi Ekin Akkus, Paul Francis

SplitX is a high-performance private analytics system resistant to answer pollution. It is designed under the assumption that analysts and clients are potentially malicious while servers are honest. 

The key factors differentiating SplitX from other analytics systems are XOR encryption and query buckets. SplitX achieves high performance in terms of bandwidth and computation by substituting cryptographic encryption with XOR operation. In order to limit answer pollution, clients are restricted to answer queries in binary format.

In the SplitX system, clients subscribe to the queries published by the analysts. Clients split their answer and send them to mixes, which add differentially private noise to the messages. Aggregators generate query results by combining the outputs of the mixes. Double-splitting is used at the mixes to guarantee privacy.

Q: What are the long-term incentives in using this system?
A: SplitX is highly relevant in the current scenario where users are increasingly concerned about privacy.

Q: SplitX uses splitting at several stages. What is the time required per splitting?
A: Splitting involves XOR operation only. Since XOR is extremely efficient, the time required for splitting is negligible.

Thursday, August 15, 2013

SIGCOMM2013: Integrating Microsecond Circuit Switching into the Data Center

Presenter: George Porter 
Co-author: Richard Strong, Nathan Farrington, Alex Forencich, Pang Chen-Sun, Tajana Rosing, Yeshaiahu Fainman George Papen,  Amin Vahdat

This paper designs and builds an optical circuit switching (OCS) prototype (called Mordia) which achieves a switching time of 11us. The authors then identify a set of challenges in using existing control planes for such microsecond latency switching. To address these challenges, the authors propose TMS; a control plan for fast circuit switching that uses application information and short term demand estimates to compute schedules and proactively communicates circuit assignments to communicating entities.

To achieve high utilization, the computed schedules are sent to ToRs connected to Mordia. The ToRs in turn adjust the transmission of packets into the network to match the scheduled switch reconfigurations, with complete knowledge of when bandwidth will be most available to a particular destination. Thus, both short and long flows can be offloaded into the OCS.

TMS can achieve 65% of the bandwidth of an identical link rate electronic packet switch (EPS) with circuits as short as 61us duration, and 95% of EPS performance with 300us circuits using commodity hardware.

Q: How do you configure queues on your system?
A: Queue classification is based on ip address.

Q: Control plane and circuit switching, have you put these two pieces in your work together?
A: We have integrated the two.

SIGCOMM2013: Got Loss? Get zOVN!

Presenter: Daniel Crisan
Co-authors: Robert Birke, Gilles Cressier, Cyriel Minkenberg and Mitch Gusat 

This paper does loss identification and characterization in virtualized data center networks
This paper quantifies the losses for several common combinations of hypervisors and virtual switches, and shows their detrimental effects on application performance. The authors propose a zero-loss Overlay Virtual Network (zOVN) designed to reduce the query and flow completion time of latency-sensitive datacenter applications. They describe its architecture and detail the design of its key component, the zVALE lossless virtual switch.

As part of the evaluation, they implemented a zOVN prototype and benchmark it with Partition-Aggregate in two testbeds, achieving an up to 15-fold reduction of the mean completion time with three widespread TCP versions. For larger-scale validation and deeper introspection into zOVN, they developed an OMNeT++ model for accurate cross layer simulations of a virtualized datacenter. 


Q: Whenever the queue is full, you put the hypervisor to sleep? How do you deal with dead lock situations?

A: Multiple threads.

Q: How many VMs per CPU do you use?

A: One VM at the sender and one at the receiver.

SIGCOMM2013: zUpdate: Updating Data Center Networks with Zero Loss

Presenter: Hongqiang Harry Liu
Co-authors: Xin Wu, Ming Zhang, Lihua Yuan, Roger Wattenhofer, David A. Maltz

Network updates(e.g, switch updates, VM migration) in Data Centers can be painful and lead to possible disruptions, because of transient link load spikes and congestion. The goal of this paper is to do congestion free/low latency network updates to ensure that services are not disrupted. This paper introduces zUpdate, which aims to compute a transition free update plan and executes it in a two-phase commit. 

In zUpdate, when an operator wants to perform a DCN update, she will submit a request containing the update requirements to the update scenario translator. The latter converts the operator’s request into the formal update constraints. The zUpdate engine takes the update constraints together with the current network topology, traffic matrix, and flow rules and attempts to produce a lossless transition plan. The zUpdate plan is formulated as a linear program subject to the constraints of flow conservation and traffic delivery.

Evaluation consists of tests on a testbed of 22-switches using Openflow and compare zUpdate with other solutions (zUpdate-onestep, ECMP-onestep, ECMP-planned). For larger topologies, they use trace driven simulations on a production-level data center topology.


 Q: If the network resources are limited, and the network update is done, how do you handle this situation?

A: In such a situation we can not guarantee a congestion free transition, instead we recommend network operators to do updates in off-peak times.

Q: Can you guarantee you can find a congestion free update, is your LP formulation optimal?

In certain scenarios, we can only guarantee best-effort solution.

SIGCOMM2013: pFabric: Minimal Near-Optimal Datacenter Transport

Presenter: Mohammad Alizadeh, 
Co-authors: Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker

This paper purposes pFabric, which aims to minimize average flow completion times in data centers. pFabric achieves this by approximating SRPT scheduling in a distributed manner. With pFabric switches employ small buffers and use priority scheduling and priority dropping based on priorities in packet headers that are assigned by end-hosts. pFabric end-hosts always send at line rate except under high packet loss rate, in which case they reduce the window size and apply slow start. In pFabric, losses are only recovered with timeouts through the use of a fixed, small RTO.

This work falls in the category of protocols which aim to approximate different scheduling algorithms (e.g., SRPT, EDF). Prior works include D3 (SIGCOMM'11), PDQ (SIGCOMM'12), D2TCP (SIGCOMM'12) and L2DCT (INFOCOM'13).  Protocols like D3 and PDQ introduce complexity inside the network, whereas pFabric aims to reduce this complexity. Moreover D3 and PDQ require 1RTT to obtain rate feedback before sending traffic, pFabric aims to overcome this by allowing senders to transmit at the interface line rate in the first RTT. 

The evaluation was done using ns-2 simulations. They evaluate pFabric's performance across a range of scenarios using empirically derived traffic distribution and show that pFabric achieves close to ideal mean flow completion times


Q: Your evaluation focuses on non-over subscribed data center topology, how would the performance change under over-subscribed topologies? And how sensitive are your results to the use of packet spraying?

A: We conduced experiments under over-subscribed topologies (the results are in the technical report cited in the paper) and we found the performance to be similar.
We compared packet-spraying with ECMP, and found that packet spraying improved performance.

Q: How do you deal with scenarios with real time traffic like VOIP, for which flow completion time is not a meaningful metric?

A: Our work does not deal with such applications.

Q: How would you deal with jumbo frames in the short buffers you use?

A: We use 1500 byte packets, which provides us with about 20 packets of buffer. With jumbo frames, we need to have more buffering, however the performance is unlikely to be significantly impacted due to the use of priority scheduling and dropping.

Q: What happens to fairness? Do you care about fairness?

A: We do not care much about the fairness, however you could achieve improved fairness at the cost of increase in completion times.

Q: How does pFabric deal with a group of flows?

A: We do not deal with such scenarios. We specifically deal with scenarios of short and long flows.

SIGCOMM13: Towards Efficient Traffic-analysis Resistant Anonymity Networks

Presented by: Stevens Le Blond

Authors: Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel, Hitesh Ballani, Paul Francis

Aqua, a k-anonymity system trumps the existing systems by providing performance guarantees in terms of low latency, high-bandwidth and resistance to traffic analysis. This is achieved by exploiting existing correlations in BitTorrent traffic.

The key feature of Aqua is the use of distinct traffic anonymization techniques at the core and the edges. At the core, payloads are split and sent over multiple paths to reduce the peak payload rate. At the edges, clients with similar traffic patterns are grouped together and forced to transmit at similar rates to realize k-anonymity.

Performance of Aqua was compared with other systems such as constant-rate systems, peer-to-peer systems and broadcast systems. While other systems had more than 80% overhead, Aqua provided bandwidth efficiency with less than 30% overhead. Throttling at the edges in Aqua was only 20%, much lower than 50-80% throttling observed in peer-to-peer and broadcast systems.

Q: Why would you run BitTorrent on top of Tor? Tor is too slow.
A : Tor is slow because there are very few servers hosting the service. We expect to have providers hosting Aqua services. If users are willing to pay for it, we could have a large number of hosts offering better service.

SIGCOMM2013: BigStation: Enable Scalable Real-time Signal Processing in Large MU-MIMO Systems

Presenter: Qing Yang
Co-authors: Xiaoxiao Li, Kun Tan, Hongyi Yao, Fang Ji, Wenjun Hu, Jiansong Zhang, and Yongguang Zhang

Today we see more demand for wireless capacity. We would like to engineer the next wireless network to match the capacity of existing wired networks. However, we can only increase the spectrum so much and reusing the spectrum increases complexity.

This work significantly increases wireless capacity using spatial multiplexing with many antennas. To handle so many antennas, computation is parallelized with many simple processing units. This work establishes a distributed processing pipeline, which exploits data parallelism across servers at each processing stage.

This work was prototyped using a Sora MIMO Kit. They found that with overprovisioned AP antennas, the peak transmission rate scales linearly with the number of antennas, with 9 antennas demonstrating a scaling of 6.8x over a single antenna.

Q: If you had to guess what the scaling limitation is, what do you think the fundamental boundary is? Is it infinitely scalable?
A: My estimation is that we can probably handle about 100 antennas.

Q: Do you think we can continue such an improvement up to 40 antennas in production?
A: There are still critical challenges that we haven't solved yet. These challenges will need to be worked out before this will be suitable for industry.

Q: You talk about the long tail of processing, is there packet loss?
A: Yes, when there is a high data rate, we do see packet loss, but I think it could be reduced.

Q: For production, you really need to design the interconnect to be synchronous, instead of using ethernet.
A: I don't know if this would actually work - we need to investigate more.

SIGCOMM2013: Bringing Cross-Layer MIMO to Today's Wireless LANs

Presenter: Swarun Kumar
Co-authors: Diego Cifuentes, Shyamnath Gollakota, and Dina Katabi

Major recent advances in cross-layer MIMO don't work on today's Wi-Fi cards, because chip manufacturers hesitate to make investments in new hardware that hasn't been fully tested on real networks.

OpenRF brings MIMO techniques to today's Wi-Fi cards. OpenRF's data plane needs to be able to apply PHY techniques to commodity cards and its control plane needs to self-configure to dynamism in the network. OpenRF handles these challenges by using two transmit queues and by handling some scheduling locally.

OpenRF was implemented on Intel 5300 Wi-Fi cards. OpenRF demonstrated an average gain of 1.6x in TCP throughput over 802.11 in a large-scale experiment, as well as clear improvements in video quality in real applications.

Q: How would you carry the same techniques to cellular networks?
A: This is a lot more natural for cellular networks because they already have some notion of scheduling.

Q: What if Alice and Bob move? How do you locate them?
A: The location of Alice and Bob really doesn't matter. These access points can track the channels as they change at the clients.

Q: You seem to use the same matching as OpenFlow for your flow tables. Wouldn't just the MAC address be sufficient? Why do you have to look at flows?
A: Sometimes you might care about different flows (for example long-lived flows but not bursty traffic).

Q: As you add more antennas you can move towards a wire abstraction. Have you thought about how this would affect your techniques?
A: Our techniques work even better with more antennas (ex: 3), so this would work.

Q: Today's mobile networks have a coordination mechanism. To what extent do we need centralized control?
A: The centralized controller is important for dealing with interference. You need to coordinate interference between access points.

SIGCOMM2013: Full Duplex Radios

Presenter: Dinesh Bharadia
Co-authors: Emily McMilin and Sachin Katti

Achieving full duplex radio is difficult because self-interference is a hundred billion times stronger than the received signal. Canceling the transmitted signal is difficult because the signal transmitted is actually quite different from what you think you are transmitting, due to noise and non-linear affects called harmonics. You need 110 dB (at least 70 dB of which is analog) in order to eliminate the transmitter noise.

This paper presents the first full duplex radio, which uses a new cancellation technique that eliminates all self-interference. Their approach is a hybrid design, with both analog and digital components.

Using commodity radios, this work is able to reduce self interference to within 1 dB of the noise floor. This approach significantly outperforms previous works in reducing the self-interference residue. This design achieves a 1.97x increase in throughput over non-duplex designs, which is very close to the optimal 2x throughput increase.

Q: How does this compare with a previous work from Waterloo?
A: They acheived 50 dB of cancellation, but they didn't do any non-linear cancellation.

Q: You are suggesting that your own work from SIGCOMM 2011 does worse than half-duplex?
A: That wasn't our work.

Q: Would this scale?
A: I still believe this would scale.

SIGCOMM2013: An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance

Presenter: Morley Mao
Co-authors: Junxian Huang, Feng Qian, Yihua Guo, Yuanyuan Zhou, Qiang Xu, Subhabrata Sen, and Oliver Spatscheck

LTE is a fairly new technology, so little is known about the bandwidth, latency, and RTTs that its users experience in commercial networks. Information about the properties of LTE in commercial networks would enable transport layers and applications that are more LTE-friendly.

In this work, they analyzed an anonymized packet header trace from a US metropolitan area, which included 3 TB of LTE traffic. They observed undesired slow starts in 12% of large flows. They created an algorithm to estimate available bandwidth and utilization from the trace.

They found the median bandwidth utilization to be 20%, and that for 71% of the large flows, the bandwidth utilization is below 50%. They also found high LTE bandwidth variability, and that TCP's performance was degraded by a limited receive window. They suggest that these problems could be addressed by updating RTT estimates in the transport layer and reading data from TCP buffers more quickly in the application layer.

Q: Which flavors of TCP were you looking at?
A: Cubic.

Q: Some applications intentionally use small TCP windows, did you investigate that?
A: The application may be doing some kind of rate limiting, so that is a possible explanation for the window. However, most applications in the trace only opened one TCP connection.

Q: To what degree do your observations depend on the particular LTE network?
A: Our observations are limited by the trace that we have. We did local experiments on two commercial networks and observed similar behavior. Our study is limited to LTE networks in the US.

Q: You seem to put some blame on the carrier for having large buffers - can this problem be fixed in practice? What should they do?
A: Yes that is a problem (buffer bloat). The loss rate in these networks is already low, so I am not sure if eliminating large buffers is the only solution.

SIGCOMM13: ElasticSwitch: Practical Work-Conserving Bandwidth Guarantees for Cloud Computing

The paper was presented by Lucian Popa.
Other co-authors are: Praveen Yalagandula, Sujata Banerjee, Jeffrey C. Mogul, Yoshio Turner and Jose Renato Santos.

This talk was about network resource guarantees in cloud computing. The author presented ElasticSwitch, an efficient and practical approach for providing bandwidth guarantees.

Goal of ElasticSwitch:
  • provide minimum bandwidth guarantees in Clouds
  • work-conserving allocation
  • be practical

ElasticSwitch design:
ElasticSwitch resides in the hypervisor of each host, not in the individual VMs. ElasticSwitch contains two components: guarantee partitioning and rate allocation. To guarantee partition, ElasticSwitch turns hose model into a VM-to-VM pipe guarantee. Guarantee partitioning leverages max min allocation and there are 3 goals: safety, efficiency and no starvation.
In rate allocation, ElasticSwitch uses rate limiters and increases rate between X-Y above it’s partition when there is no congestion between X and Y. Congestion is detected through dropped packets, or use ECN. The adaptive algorithm used in ElasticSwitch is from Seawall (NSDI 11).

The testbed contains 100 servers and 1Gbps tree network. The first workload tested is many-to-one workload. The result shows that ElasticSwitch can provide bandwidth guarantees and achieve ideal situation. The second workload is MapReduce. It shows that in the worst case, job completion time is much shorter when using ElasticSwitch comparing with no bandwidth guarantees.

More details can be found:

SIGCOMM13: Developing a Predictive Model of Quality of Experience for Internet Video

The paper was presented by Athula Balachandran.
Other co-authors are: Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica and Hui Zhang.

The metrics to measure Internet video QoE have shifted from traditional method to new method. This paper is about developing a predictive model of Internet video QoE. The model meets two requirements: (1) it has to be tied in to observable user engagement; (2) it should be actionable to guide practical system design decisions.

Commonly used quality metrics include join time, buffering ratio, rate of buffering, rate of switching, average bitrate, etc. However, which metrics should we use for QoE are unknown. This work develops a unified and quantitative QoE model to solve the problem.

  • complex engagement-to-metric relationships and complex metric interdependencies
  • identify confounding factors
  • incorporate confounding factors

  • Cast complex relationships as a machine learning problem.
  • Design different test to examine potential factors and then identify key confounding factors.
  • Two methods are proposed to refine the model to incorporate confounding factors: add confounding factors as a feature and split the data by confounding factors. The speaker shows that split is better than feature, allowing the model to achieve 70% accuracy.

The speaker shows that there is 100% improvement of average engagement comparing with baseline and 20% improvement comparing with other strategies.

Q: Is there any correlation between some confounding factors?
A: Some correlations are seen because of user behaviors.
Q: how does this approach capture real user effects in the wild?
A: The data used in the analysis is from real user data.
Q: Why not use other machine learning models?
A: Could use.

More details can be found:

SIGCOMM13: Participatory Networking: An API for Application Control of SDNs

The paper was presented by Andrew D. Ferguson

Other co-authors are: Arjun Guha, Chen Liang, Rodrigo Fonseca and Shriram Krishnamurthi.

This talk was about PANE, an API for applications to control SDN. With PANE, user can work with network to achieve better performance, security and predictive behaviours.

Features of Participatory Networking:
  • End-user API for SDNs
  • Exposes existing mechanisms
  • No effect on unmodified applications

  • how to decompose control and visibility of the network
  • how to resolve conflicts between untrusted users and across requests, while maintaining baseline levels of fairness and security

Approaches to address the challenges:
  • Decompose control and visibility:
    • Define control and visibility as Share. Share contains 3 parts: flowgroup, principals, privileges
    • Share could be decomposed to sub shares, and can form Share Tree (hierarchy of shares).
  • Resolving conflicts:
    • Policy tree: hierarchical flow table to resolve conflicts from leaves node to root node.
    • The only requirement is: associative. No identity information is needed.


The system is evaluated in a hadoop platform. In the test, there are 3 sort jobs: two low priority and one high priority. PANE can Speed up by 23% in high priority jobs. Also less than 30 rules coexist in the system.

Q: What about other resources like latency?
A: Will be considered in the future development. For example switch hops as indication of latency.

More details can be found:

Wednesday, August 14, 2013

SIGCOMM2013: FCP: A Flexible Transport Framework for Accommodating Diversity

Presenter: Dongsu Han


The talk was about the flexible framework for network resource allocation, called FCP, that accommodates diversity by exposing a simple abstraction for resource allocation.  
Congestion control is concerned about resource allocation, which requires coordination among all participants to ensure high utilization and fairness.  Two different styles of congestion control cannot co-exist as they interfere with each other’s resource allocation.  There are two congestion control policies: end-to-end based (TCP) and router-assisted (XCP, RCP).  End-point based systems like TCP provides flexibility to deploy different algorithms and the notion of TCP friendliness provides a mechanism for co-existence between different algorithms and behaviors.  But router-assisted congestion control is more efficient than TCP in achieving high utilization, small delays and faster flow completion times.  So, the presenter presents the congestion control protocol – FTP, which combines the best of both the worlds.

They designed a protocol, which allows each domain to allocate resources (budget) to a host and make networks explicitly signal the congestion price. To ensure safe co-existence of different end-point resource allocation strategies, the system maintains a key invariant that the amount of traffic a sender can generate is limited by its budget, which is the maximum amount it can spend per unit time.
Each sender (host) is assigned a budget ($/sec), the amount, which it can spend per unit time. At the start of the flow, sender allocates part of its budget to the flow taking into account the traffic demands and application objectives.  The rate of each flow generated by the sender is equal to the budget allocated to it divided by the price for the path that the flow is traversing.  The network determines the congestion price ($/bit) for the each flow in the form of the feedback. The price of the path traversed by a flow is the sum of link prices. 
The also talked about preloading which helps in increasing or decreasing the budget amount per flow on a packet-by-packet basis.  

They compared FCP with other schemes (XCP and RCP) using packet level simulations. FCP convergence is faster than the other two for long running flows.  Average bottleneck link utilization was 99% with FCP, 93% with XCP and 97% in case of RCP. FCP handles mixed flow sizes much more gracefully and efficiently than RCP.

FCP accommodates diverse behaviors in resource allocation while utilizing explicit feedback. FCP maximizes end-point flexibility by simplifying the mechanism of coexistence. FCP’s explicit feed-back and feed-forward provides a generic interface for efficient resource allocation.

Q:  How do you allocate budget to a new flow?
A: We take from the existing flows and allocate to the new flow.

Q:  If you are aware of RUCN? And how is your approach different than them?
A: We use the similar idea where we attach the previous price to the packet header and the routers extract the price.  But the difference is that they are allocating budget over a long period of times like a month or billion periods but we allocate it for shorter duration. And here the unit is $/sec, with every second you get this budget, the constant streaming rate of budget, that’s the difference!

Q: Why can’t you achieve coexistence using multiple queues?
A: Because that is not scalable.

SIGCOMM13: Trinocular: Understanding Internet Reliability through Adaptive Probing.

Presenter: Lin Quan

This paper studies the Internet outages by using Trinocular which relies on the detection of three states (up, down, and uncertain). Authors argue that outages are mainly due to three reasons: Political, weather, and Infrastructure. These outages are equally important for users, network operators, and researchers.

Trinocular uses active probes to study the reliability of the Internet edges. Probes are only sent when needed and the additional increase in the Internet radiations is less than 0.7%. Multiple instance of Trinocular are used to detect global outages. The probing interval is around 11 minutes and sends less than 20 probes per /24 network. Trinocular relies on Baysian inference to decide how many probes are necessary to send.

The main take away is that Trinocular is precise and significantly more accurate than the existing methods. For details check out the paper at

Q: Have you differentiated between outage and a powered off host.
A: No, we cannot differentiate.

Q: Why it is called Trinocular?
A: Because it has three states, up, down, and uncertain.

SIGCOMM13: Privacy in Content-Oriented Networking: Threats and Countermeasures

%why the reader should care about the paper?
On the design of Content-Oriented Network (I would like to call ICN/CCN), people have focused on mobility, scalability a lot. However, privacy should be also taken into consideration.

%This result is significant because:

systematic analysis of privacy issues in CON as a generic paradagm, discussing different attacks and detailing their impact on user privacy. They also proposed several countermeasures while attempting to balance the trade-off between privacy, performance, and changes to the architecture.
%The authors regard the system as a significant improvement over the state of the art because:
No system

Here is the full paper:

SIGCOMM13: A Provider-side View of Web Search Response Time.

Presenter: YingYing Chen

Studies of search Engines such as Google and Bing have reported on tremendous revenue losses with the increase in the delay. For instance 0.5 sec of delay in Google searches result in the drop of 20% revenue. This paper studies the variations in the search response time (SRT) in peak and off-peak hours. The authors found counter intuitive results that off-peak hours result in high delay.

To find out the different factors that lead to higher SRT in off-peak hours, this paper further investigate major impact factors such as servers, network, browsers, and query. They perform ANOVA to decompose the variations in the different time intervals. Their results show that more than 65% of the variance was due to network factor, followed by variations due to the browser speed and the nature of the query.
Server side processing time has relatively small contribution to the overall delay.

Higher network latencies are due to more queries from the residential networks in the off-peak hours. In addition, authors found that off-peak hours have different nature of queries in terms of average number of images requested as compare to the queries sent from the enterprise network. Overall, it was argued that understanding SRT is challenging due to changes in the user demographics that lead to systematic variations in the SRT. Performance debugging in SRT is tained by the user behavior.

Q1: Do you focus on only browser requests or have you also considered mobile

Ans: We have only considered browsers requests.

q2: (Dina P.) : how do you compute your ground truth for your three techniques

Ans: We compared it against the tickets generator by the operators.

Q2: (Paul Barford) There are many papers on time-series based techniques to identify the anomalies. Did you consider standard time-series or wavelet based methods.


(Comment from Paul) When you show your technique then by default you argue your technique against other techniques.

SIGCOMM2013: TCP ex Machina: Computer-Generated Congestion Control

Presenter: Keith Winstein

The talk was about the new approach to control end-to-end congestion on a multi-user network. The main motivation behind the system built by the authors “is it possible for a computer to discover the right rules for congestion control in heterogeneous and dynamic networks? Should computers, rather than humans, be tasked with congestion control methods? And how well can computers perform this task?”

They presented the system “Remy”, a program that generates congestion control schemes offline. 
Input to the system:
1.  Prior assumptions depicting what the networks may do.
2. Goal to highlight what applications want.
Output: CC algorithm for a TCP sender (RemyCC)
Time: A few hours
Cost: $5-$10 on Amazon EC2
Remy searches for the best congestion-control algorithm, optimizes expected objective over prior assumptions and makes tractable by limiting available state.  Remy finds the piecewise-continuous RULE() that optimizes expected value of objective function.

The authors have done the evaluation of their system in ns-2 and have compared the results with end-to-end (NewReno, Cubic, Compound and Vegas) and in-network solutions (Cubic-over-sfqCoDel, XCP).  The evaluation results showed that with Remy algorithm one can achieve maximum throughput and least queuing latency for fixed rate networks. But the algorithm is not so good for variable rate networks.

Remy provides complex rules with consistent behavior as compared to simple rules and complex behavior in case of traditional policies. Computer-designed and end-to-end solution is better than human-designed and in-network congestion control solutions.

Q: Kelly also solves this problem. So how is your system different than it?
Ans: Remy is targeted for dynamic case real networks
Q: Why were you surprised that your(computer generated) scheme is better? [Unable to understand the complete question clearly]
Ans: I am not surprised that computer generated end-to-end scheme can beat the in-network end-to-end scheme. I am surprised that end-to-end scheme can beat the in-network schemes. I didn’t imagine that how computer-generated schemes can outperform XCP. 
Q: What about heterogeneous systems with different RemyCC vs TCP?
Ans: We did talk about it in a paper with RemyCC vs existing TCP.