Wednesday, August 19, 2015

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale

Authors: Xiaoqi Ren (Caltech), Ganesh Ananthanarayanan (MSR), Adam Wierman (Caltech), Minlan Yu (USC)
Presenter: Xiaoqi Ren 

The author started by introducing hopper which is a speculation-aware job scheduler, and implements both decentralized and centralized prototypes. She began by showing limitation of current designs of schedulares’s literature (both centralized and decentralized) where it ignores an important aspect of clusters: straggler mitigation via speculation.
The authors asks central question in the design of a speculation aware job scheduler which is how to dynamically (online) balance the slots used by speculative and original copies of tasks across jobs.
 The authors shows the key insight behind Hopper which is that a scheduler must anticipate the speculation requirements of jobs and dynamically allocate capacity depending on the marginal value (in terms of performance) of extra slots which are likely used for speculation. The presentation later showed the three demonstration prototypes of Hopper where they augment the centralized scheduling frameworks Hadoop (for batch jobs) and Spark for interactive jobs), and the decentralized framework Sparrow.

Beside the compatibility of Hopper with all current speculation algorithms, Hopper’s job speed ups of 66% in decentralized settings and 50% in centralized settings compared to current state-of-the-art schedulers.

Q: The optimization systems depeneds on these data paerematers that you learn. Giving the ability of these learning parmementers, do you see repteaibility across jobs you need to maintain state of previously ran jobed in order to learn data correctly and how sensitive these results to accuracy in all stragleer compitions 
A: Our evaluation we don’t artififually add any propbabilty distriatiob to manualy create struggles, we just use workload and try to use completion time of the task to fit in the distribution to gues the beta. We also show in the paper that if user can show the probs based on their charictariztions of how likely struggler will occur we can get more efficient results.