CAREER: Web Information Extraction: Scaling and Integration

Project: Research project

Project Details


This project is aimed at Web Information Extraction (WIE), the task of automatically extracting and integrating knowledge from the World Wide Web. For human readers, a massive amount of valuable knowledge is available on the Web—but new techniques are required to extract this knowledge into a machine-processable form. Such techniques could enable vastly improved search engines, and potentially provide a collection of knowledge that brings human-level artificial intelligence closer to reality. Classically, information extraction has utilized conventional machine learning, in which extractors are trained on data sets labeled by a trusted expert. The Web, however, includes a vast amount of unlabeled data, and acquiring expert labels for each concept of interest is intractable. The human input we can cost-effectively acquire at scale comes not from a single expert but instead from heterogeneous annotators across the Web—some much more expert than others. This proposal focuses new WIE techniques that scale to this setting.
Effective start/end date9/1/148/31/20


  • National Science Foundation (IIS-1351029 005)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.