Wednesday, August 23, 2017

Session 5 Paper 2: Neural Adaptive Video Streaming with Pensieve

Presented by: Hongzi Mao

Authors: Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh (MIT Computer Science and Artificial Intelligence Laboratory)

Today’s video streaming relies on adaptive bitrate (ABR), e.g. 240P or 1080P, which is selected based on the network condition. The quality of a video is lower with low ABR but an ABR that is too high for the network condition to support would result in video pause. The authors proposed Pensieve, which adjusts ABR based on reinforcement learning on the network conditions and the resulted video quality under the selected ABR.

In this reinforcement learning problem, the action space is the ABR selections, e.g. 240P or 1080P. The reward function considers bitrate, rebuffering, and smoothness. For the state space, many features are considered, including chunk throughput, chunk download time, next chunk size, current buffer size, and past chunk bitrate, etc. These diverse features in the state space would be more helpful than mere throughput prediction and/or buffer occupancy in prior works.

They trained and tested over real network traces and find Pensieve would deliver 12-25% better QoE,  and 10 - 30% less rebuffering than previous ABR algorithms.
Q: How do you explain and understand where the benefits of your reinforcement learning algorithm come from?
A: Explaining the neural network remains a hard problem. We find Pensieve benefits from better control on rebuffering.

Q: What is the cost of computation?  
A: Storage cost is small. Training requires expensive computation but not much computation is needed for ABR selection based on the trained model.

Q: Do you compare with past works on model based congestion control?
A: It is hard to model the network and therefore we propose data driven congestion control.

Q: Did you try user satisfaction for the reward function?
A: No, because it is harder and slower to quantify user satisfaction than our simulation strategy, but it would be trivial to replace the function.

Q: How does it scale to many clients?  
A: We can learn for different clients and maybe coordinate the ABR for multiple clients.