Abstract
Our KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an autonomous, domain-independent, and scalable manner. In its first major run, KNOWITALL extracted over 50,000 facts with high precision, but suggested a challenge: How can we improve KNOWITALL's recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Rule Learning learns domain-specific extraction rules. Subclass Extraction automatically identifies sub-classes in order to boost recall. List Extraction locates lists of class instances, learns a "wrapper" for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL's domain-independent methods, no hand-labeled training examples are required. Experiments show the relative coverage of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4-fold to 19-fold increase in recall, while maintaining high precision, and discovered 10,300 cities missing from the Tipster Gazetteer.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-04) |
Subtitle of host publication | Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) |
Pages | 391-398 |
Number of pages | 8 |
State | Published - Dec 9 2004 |
Event | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) - San Jose, CA, United States Duration: Jul 25 2004 → Jul 29 2004 |
Other
Other | Proceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) |
---|---|
Country/Territory | United States |
City | San Jose, CA |
Period | 7/25/04 → 7/29/04 |
ASJC Scopus subject areas
- Software
- Artificial Intelligence