Googling the internet: Profiling internet endpoints via the world wide web

Ionut Trestian*, Supranamaya Ranjan, Aleksandar Kuzmanovic, Antonio Nucci

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

Understanding Internet access trends at a global scale, i.e., how people use the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around uson the Web. In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a tt Google-based profiling tool, that accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the Web. Our Web-based unconstrained endpoint profiling (UEP) approach shows advances in the following scenarios: 1) even when no packet traces are available, it can accurately infer application and protocol usage trends at arbitrary networks; 2) when network traces are available, it outperforms state-of-the-art classification tools such as BLINC; 3) when sampled flow-level traces are available, it retains high classification capabilities. We explore other complementary UEP approaches, such as p2p-and reverse-DNS-lookup-based schemes, and show that they can further improve the results of the Web-based UEP. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America, and Europe). We provide the first-of-its-kind endpoint analysis that reveals fascinating similarities and differences among these regions.

Original languageEnglish (US)
Article number5313845
Pages (from-to)666-679
Number of pages14
JournalIEEE/ACM Transactions on Networking
Volume18
Issue number2
DOIs
StatePublished - Apr 2010

Keywords

  • Clustering
  • Endpoint profiling
  • Google
  • Traffic classification
  • Traffic locality

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Googling the internet: Profiling internet endpoints via the world wide web'. Together they form a unique fingerprint.

Cite this