Authors: Ning Xia, Han Hee Song, Yong Liao, Marios Iliofotou, Antonio Nucci, Zhi-Li Zhang and Aleksandar Kuzmanovic.
For a growing number of users, online social networking (OSN) sites such as Facebook and Twitter have become an integral part of their online activities. This paper calls attention to the privacy leakage in mobile network data. This paper also calls attention to an important aspect of the privacy leakage problem: namely, the potential danger to user privacy posed by a third party, not simply by crawling data directly from OSN sites, but by gathering digital footprints left by users in cyberspace. GPS and other location information in mobile cellular data make it possible to tie users’ cyber activities to their presence in the physical world. The conﬂuence of smart phones and OSNs renders the ability to glean personal information from mobile data a far more potent threat to user privacy than attacks on each individual service. These pose a serious threat to user privacy. This happens because of some shortcomings of certain OSN design, as well as by the fundamental limitations of the current Web and Internet from a user privacy perspective, such as cookie mechanism used by the stateless HTTP protocol.
They refer to this problem as constructing a MOSAIC of a user from their online digital footprints, and correspondingly refer to the gathered footprint pieces as TESSERAE.
As a solution they have develop the Tessellation methodology. Through Tessellation, they show how user identity information such as OSN IDs and device tracking cookies can be extracted from the traffic. Furthermore, they describe how the remaining pieces of traffic with no identity leakages can be attributed to the known user identities.
They claimed that Tessellation can attribute 50% of traffic to the owners with only 5% error. Optionally, the coverage can be increased to 80%, with just a 2% increase in the error rate. Using this methodology, they were able to create mosaics for more than 16,000 users and classify their personal information into 59 categories including user demographics, locations, affiliations, social activities, interests, etc. And as a solution they suggest possible countermeasures to safeguard against the alarming leakage of private information.
Q. From where do they obtain OSN User Identiﬁers and Information?
A: Many OSN sites due to their weak designing “leak” their user identiﬁers allows Tessellation to attribute traffic to real users. HTTP headers are used to obtain URL, Cookies and payload information to get user login and session key information.
Q. How to get the value of coverage? What are the types of coverage?
A: There are two types of Coverage: a) Session Level Coverage and b) User Level Coverage. Session-level coverage is the number of sessions that are given a prediction (i.e., sum of sessions in all Ts), divided by the total number of sessions. User-level coverage is the number of ground truth users for whom Tessellation identiﬁed all or a subset of their sessions divided by the total number of ground truth users.