TY - JOUR
T1 - Extracting user web browsing patterns from non-content network traces
T2 - The online advertising case study
AU - MacIá-Fernández, Gabriel
AU - Wang, Yong
AU - Rodríguez-Gómez, Rafael A.
AU - Kuzmanovic, Aleksandar
N1 - Funding Information:
This work is supported by Spanish MEC project TEC2008-06663-C03-02 (70% FEDER funds), NSF CAREER Award No. 0746360, and China Scholarship Council.
PY - 2012/2/2
Y1 - 2012/2/2
N2 - Online advertising is a rapidly growing industry currently dominated by the search engine 'giant' Google. In an attempt to tap into this huge market, Internet Service Providers (ISPs) started deploying deep packet inspection techniques to track and collect user browsing behavior. However, these providers have the fear that such techniques violate wiretap laws that explicitly prevent intercepting the contents of communication without gaining consent from consumers. In this paper, we explore how it is possible for ISPs to extract user browsing patterns without inspecting contents of communication. Our contributions are threefold. First, we develop a methodology and implement a system that is capable of extracting web browsing features from stored non-content based network traces, which could be legally shared. When such browsing features are correlated with information collected by independently crawling the Web, it becomes possible to recover the actual web pages accessed by clients. Second, we evaluate our system on the Internet and check that it can successfully recover user browsing patterns with high accuracy.
AB - Online advertising is a rapidly growing industry currently dominated by the search engine 'giant' Google. In an attempt to tap into this huge market, Internet Service Providers (ISPs) started deploying deep packet inspection techniques to track and collect user browsing behavior. However, these providers have the fear that such techniques violate wiretap laws that explicitly prevent intercepting the contents of communication without gaining consent from consumers. In this paper, we explore how it is possible for ISPs to extract user browsing patterns without inspecting contents of communication. Our contributions are threefold. First, we develop a methodology and implement a system that is capable of extracting web browsing features from stored non-content based network traces, which could be legally shared. When such browsing features are correlated with information collected by independently crawling the Web, it becomes possible to recover the actual web pages accessed by clients. Second, we evaluate our system on the Internet and check that it can successfully recover user browsing patterns with high accuracy.
KW - Online advertising
KW - Web fingerprinting
KW - Web navigation
UR - http://www.scopus.com/inward/record.url?scp=84856043179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856043179&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2011.10.012
DO - 10.1016/j.comnet.2011.10.012
M3 - Article
AN - SCOPUS:84856043179
SN - 1389-1286
VL - 56
SP - 598
EP - 614
JO - Computer Networks
JF - Computer Networks
IS - 2
ER -