Abstract
Learning text patterns that suggest a desired type of information is a common strategy for extracting information from unstructured text on the Web. In this paper, we introduce the idea that learned patterns can be used as both extractors (to generate new information) and discriminators (to assess the truth of extracted information). We demonstrate experimentally that a Web information extraction system (KnowItAll) can be improved (in terms of coverage and accuracy) through the addition of a simple pattern-learning algorithm. By using learned patterns as extractors, we are able to boost recall by 50% to 80%; and by using such patterns as discriminators we are able to reduce classification errors by 28% to 35%. In addition, the paper reports theoretical results on optimally selecting and ordering discriminators, and shows that this theory yields a heuristic that further reduces classification errors by an additional 19% to 35% - giving an overall error reduction of 47% to 53%.
Original language | English (US) |
---|---|
Title of host publication | Adaptive Text Extraction and Mining, ATEM-2004 - Papers from the AAAI-04 Workshop, Technical Report |
Pages | 50-55 |
Number of pages | 6 |
Volume | WS-04-01 |
State | Published - Dec 1 2004 |
Event | 19th National Conference on Artificial Intelligence - San Jose, CA, United States Duration: Jul 25 2004 → Jul 26 2004 |
Other
Other | 19th National Conference on Artificial Intelligence |
---|---|
Country/Territory | United States |
City | San Jose, CA |
Period | 7/25/04 → 7/26/04 |
ASJC Scopus subject areas
- Engineering(all)