Thursday, December 15, 2016

CoNEXT 2016: Session 9: Measuremtn and diagnosis: EYEORG: A Platform For Crowdsourcing Web Quality Of Experience Measurements

Session 9: Measurement and diagnosis EYEORG: A Platform For Crowdsourcing Web Quality Of Experience Measurements (25 mins) (paper) Matteo Varvello (Telefonica Research), Jeremy Blackburn (Telefonica Research), David Naylor (Carnegie Mellon University), and Konstantina Papagiannaki (Google) Web quality of experience: Amazon 1 second slower - 1.6Billion loss yearly Google 0.4s slower - 8 Million loss searches per day Page load time is a hot topic! It is usually measured with OnLoad - fetch html, parse html, load js, run js, parse html and load jpg But OnLoad might not be ideal: It might overestimate with things users do not care OnLoad might underestimate with things users care about - deferred scripts for example How do we measure user perceived page load time? WE need users, challenging: we need to do it at scale... Eyeorg: Platform for crowdsourcing web quality of experience measurements Challenges: Consistent experience: Participants have different SW and networks. Non-uniform! Quantitative experience: How do we collect user input? It's hard to express a page "seems loaded" Trustworthy results: Crowd workers are not always reliable! How to get it right: - Consistency: Video of pages loading look the same to everyone. Capture videos in advance and serve videos, not sites, during tests. - Quantitative: 1) Timeline when does the page look like ready to use. It uses a "scrub bar" rather than HTML5 video controls. Preload the video to avoid "is the page in the video still loading?", Frame rewind when users submits offer the eralist similar frame. (AUTHOR shows a live demo) 2) A/B which version loaded faster. "Visual" head-to-head comparison, single video so A and B never get out of synch and random order (left and right) to track users' attention too. (AUTHOR shows a live demo) - Trustworthy results: Eyeorg filters responses using techniques from HCI literature: control questions, engagement, soft rules and wisdom of the crowd. (AUTHOR reduced a bit the talk - look a the paper) Now to the experiments: Ran three measurement campaigns on eyeorg: PLT metrics, HTTP/1.1 and HTTP/2 and ad blockers - PLT metrics: Timeline tests to compare PLT metrics. Cheaper and scalable! 120USD in 1.5 days with 1000 crowdsourced workers. OnLoad First visual change (FVC) Last visual change (LVC) SpeedIndex OnLoad and first visual change correlate best with UPLT! OnLoad is usually within 1 second of UPLT Surprise to the authors, still room to improvement. Better than the authors thought. To the community: The experiment data is available on the website! The AUTHOR finished here due to time limitation. HTTP and ad blockers are in the paper! Q&A: - Compare to another PL metric, which I could not understand the name. - Consistency to the ground truth if you show the video to the user twice, for example. Looked at the distribution. Most of the users tend to agree, not a huge range of responses. Users wait for ads others don't.