This project is aimed at Web Information Extraction (WIE), the task of automatically extracting and integrating knowledge from the World Wide Web. For human readers, a massive amount of valuable knowledge is available on the Web—but new techniques are required to extract this knowledge into a machine-processable form. Such techniques could enable vastly improved search engines, and potentially provide a collection of knowledge that brings human-level artificial intelligence closer to reality. Classically, information extraction has utilized conventional machine learning, in which extractors are trained on data sets labeled by a trusted expert. The Web, however, includes a vast amount of unlabeled data, and acquiring expert labels for each concept of interest is intractable. The human input we can cost-effectively acquire at scale comes not from a single expert but instead from heterogeneous annotators across the Web—some much more expert than others. This proposal focuses new WIE techniques that scale to this setting.
|Effective start/end date||9/1/14 → 8/31/20|
- National Science Foundation (IIS-1351029/001)
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.