Thursday, November 21, 2013

Hotnets ‘13: On the Validity of Geosocial Mobility Traces

Authors : Zengbin Zhang, Lin Zhou, Xiaohan Zhao, GangWang, Yu Su, Miriam Metzger, Haitao Zheng, and Ben Y. Zhao

There has been significant effort to understanding human mobility; a number of applications such as infrastructure deployment and AdHoc vehicular design can be enhanced significantly with knowledge of user mobility. However, obtaining large-scale, accurate and detailed traces of human movement has proven to be hard. Social networks such as Foursquare offer new ways of obtaining human mobility traces. The goal of this paper is to examine the quality and representativeness of this data, and based upon the quality, possibly clean up the data and perhaps extrapolate from it.

The paper evaluates the quality of the data from social networks by comparing it with GPS traces of “ground truth” data. The authors wrote an app which can be deployed on users’ phones which records users’ locations using GPS (and whatever other localization techniques are availables). This location data is then compared with the “Check-in” data that users provide on Social networking websites.

The authors found that a majority of data logged by the social networks is spurious. Several companies offer discounts and other incentives to “Check-in” frequently. This leads to a number of spurious check-ins by users. Also, users do not “Check in” if they are going to ‘routing’ places like their home, workplace, or grocery stores. Thus, down-sampling occurs a lot too.

In spite of these flaws in the data, it may still be of use. Extraneous check-ins can be removed by comparing them with ground-truth data. Further, it may be possible to remove missing locations by up-sampling observed checkins based on statistical models of real user mobility.

Q) What’s the accuracy of GPS coordinates? People turn off GPS all the time, and perhaps the locations in the study are not accurate?
A) In general, we force GPS to be on as much as possible. If it’s not on, use everything else that we can. For most of data, we had fairly reliable GPS. There is a 5% error region that we are not accounting for; perhaps these are caused by location errors.

Q) Doesn’t Foursquare validate when check-ins come in?
They do not.

Q) Do companies like FourSquare allow spurious check-ins to give users the flexibility to login from indoors? If users don’t have this flexibility, perhaps they can only login from outdoors, and this would result in bad user experience ...
A) Yes, it’s a part of it. They don’t want to ban spurious check-ins. Someimtes the location itself is the problem with things like foodtrucks, which are mobile and move around a lot. That’s why the threshold in our study is really large.  
Foursquare’s own test shows that GPS isn’t highly reliable unless everything else that could be used for location-sensing was turned on. So we were not focussed on saving power, we just turned on everything too.

Q) Twitter has GPS info also. Is there incentive to show location?
A) My guess is Extraneous check-ins will disappear for most part if GPS coordinates are provided. But there will still be downsampling. Pruning the data is easy, but extrapolating is the real challenge.