NeTS: Small: Mashup Content Harvesting for an Open Internet

Project: Research project

Project Details


Over the next decade, approximately five billion people will become connected to the Internet. The biggest increase will be in societies where the Internet censorship is thriving. The list of censorship events is extensive and global: it ranges from governments blocking tens of thousands of opposition web sites (e.g., Russia), bringing laws that enable a government to block websites without prior court approval (e.g., Turkey), running nationwide firewalls to effectively censor traffic at a large scale (e.g., China), periodically and automatically resetting all encrypted traffic in a country (e.g., Iran), or completely isolating a nation’s Internet from the rest of the world, temporarily (e.g., Egypt) or permanently (e.g., North Korea). While the censorship technologies are a multibillion-dollar industry, the tools to measure and assess digital repression get only a few million dollars in government and private funding. More importantly, while detecting censorship is vital, providing systems to undermine censors is even more essential. Unfortunately, no existing systems are capable of comprehensively addressing the problem. In particular, scale, trust, and usability are the main obstacles that kept any one technology from resolving the problem. Indeed, numerous reports show that a combination of the above issues is responsible for leaving dissidents and members of harassed minorities out of reach of current systems. We propose mashup content harvesting as a comprehensive approach to the above censorship-induced problems. In our approach, users have the ability and means, which we will develop, to create or replicate content by representing it in terms of the significant amount of data publicly available from non-censored web sources. By enabling users to effectively mirror new data as a function of existing data, it becomes feasible to achieve mass-scale content distribution, yet without hosting any new server- or peer-to-peer infrastructure. The proposed research will address fundamental questions that are key to developing and deploying mashup content harvesting. We will conduct research to develop a mathematically rigorous, computationally sound and scalable framework to solve key challenges associated with mashup content harvesting, including (i) how to embed both data and hopping information via information available on publicly accessible sites, (ii) how to select a set of core data segments to maximize mirroring efficiency, (iii) how to devise and deploy methods for scalable and accurate characterization of mashup content harvesting data carriers, (iv) which fundamental Web carrier properties enable effective mashup content generation and harvesting, (v) what are the lower-bounds in terms of the Web page diversity and connectivity that affect system performance, (vi) how mashup content harvesting performs in closed national-level Web environments, and (vii) how mashup content harvesting performs when content carriers are limited to Web pages associated with specific world languages, etc? This project has the capacity to make a significant impact by facilitating the development of free and open society. For one, by enabling the freedom of information, which is a universally recognized human right, it can help people who historically have been isolated, to get a chance to become active, prosperous, and engaged participants in the world community. Second, only a truly open Internet helps fuel the economy, increases productivity, and opens business and innovation opportunities around the world. Beyond censorship,
Effective start/end date10/1/159/30/19


  • National Science Foundation (CNS-1526052)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.