Web-scale information extraction in knowltAll (preliminary results)

Oren Etzioni*, Stanley Kok, Stephen Soderland, Michael Cafarella, Ana Maria Popescu, Daniel S. Weld, Doug Downey, Tal Shaked, Alexander Yates

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

410 Scopus citations


Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KNOWITALL, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KNOWITALL, running for four days on a single machine, was able to automatically extract 54,753 facts. KNOWITALL associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KNOWITALL's architecture and reports on lessons learned for the design of large-scale information extraction systems.

Original languageEnglish (US)
Title of host publicationThirteenth International World Wide Web Conference Proceedings, WWW2004
Number of pages11
StatePublished - 2004
EventThirteenth International World Wide Web Conference Proceedings, WWW2004 - New York, NY, United States
Duration: May 17 2004May 22 2004

Publication series

NameThirteenth International World Wide Web Conference Proceedings, WWW2004


OtherThirteenth International World Wide Web Conference Proceedings, WWW2004
Country/TerritoryUnited States
CityNew York, NY


  • Information extraction
  • Mutual Information
  • Search

ASJC Scopus subject areas

  • Engineering(all)


Dive into the research topics of 'Web-scale information extraction in knowltAll (preliminary results)'. Together they form a unique fingerprint.

Cite this