TY - GEN
T1 - Web-scale information extraction in knowltAll (preliminary results)
AU - Etzioni, Oren
AU - Kok, Stanley
AU - Soderland, Stephen
AU - Cafarella, Michael
AU - Popescu, Ana Maria
AU - Weld, Daniel S.
AU - Downey, Doug
AU - Shaked, Tal
AU - Yates, Alexander
PY - 2004
Y1 - 2004
N2 - Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALL's architecture and reports on lessons learned for the design of large-scale information extraction systems.
AB - Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALL's architecture and reports on lessons learned for the design of large-scale information extraction systems.
KW - Information extraction
KW - Mutual Information
KW - Search
UR - http://www.scopus.com/inward/record.url?scp=17644418833&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=17644418833&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:17644418833
SN - 158113844X
T3 - Thirteenth International World Wide Web Conference Proceedings, WWW2004
SP - 100
EP - 110
BT - Thirteenth International World Wide Web Conference Proceedings, WWW2004
T2 - Thirteenth International World Wide Web Conference Proceedings, WWW2004
Y2 - 17 May 2004 through 22 May 2004
ER -