Abstract
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first major run, KNOWITALL extracted over 50,000 facts with high precision, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Rule Learning learns domain-specific extraction rules. Subclass Extraction automatically identifies sub-classes in order to boost recall. List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, no hand-labeled training examples are required. Experiments show the relative coverage of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 19-fold increase in recall, while maintaining high precision, and discovered 10,300 cities missing from the Tipster Gazetteer.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-04) |
| Subtitle of host publication | Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) |
| Pages | 391-398 |
| Number of pages | 8 |
| State | Published - Dec 9 2004 |
| Event | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) - San Jose, CA, United States Duration: Jul 25 2004 → Jul 29 2004 |
Other
| Other | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) |
|---|---|
| Country/Territory | United States |
| City | San Jose, CA |
| Period | 7/25/04 → 7/29/04 |
ASJC Scopus subject areas
- Software
- Artificial Intelligence