Learning text patterns for web information extraction and assessment

Doug Downey*, Oren Etzioni, Stephen Soderland, Daniel S. Weld

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Scopus citations

Abstract

Learning text patterns that suggest a desired type of information is a common strategy for extracting information from unstructured text on the Web. In this paper, we introduce the idea that learned patterns can be used as both extractors (to generate new information) and discriminators (to assess the truth of extracted information). We demonstrate experimentally that a Web information extraction system (KnowItAll) can be improved (in terms of coverage and accuracy) through the addition of a simple pattern-learning algorithm. By using learned patterns as extractors, we are able to boost recall by 50% to 80%; and by using such patterns as discriminators we are able to reduce classification errors by 28% to 35%. In addition, the paper reports theoretical results on optimally selecting and ordering discriminators, and shows that this theory yields a heuristic that further reduces classification errors by an additional 19% to 35% - giving an overall error reduction of 47% to 53%.

Original languageEnglish (US)
Title of host publicationAdaptive Text Extraction and Mining, ATEM-2004 - Papers from the AAAI-04 Workshop, Technical Report
Pages50-55
Number of pages6
VolumeWS-04-01
StatePublished - Dec 1 2004
Event19th National Conference on Artificial Intelligence - San Jose, CA, United States
Duration: Jul 25 2004Jul 26 2004

Other

Other19th National Conference on Artificial Intelligence
CountryUnited States
CitySan Jose, CA
Period7/25/047/26/04

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Learning text patterns for web information extraction and assessment'. Together they form a unique fingerprint.

  • Cite this

    Downey, D., Etzioni, O., Soderland, S., & Weld, D. S. (2004). Learning text patterns for web information extraction and assessment. In Adaptive Text Extraction and Mining, ATEM-2004 - Papers from the AAAI-04 Workshop, Technical Report (Vol. WS-04-01, pp. 50-55)