KNOWITNOW: Fast, scalable information extraction from the web

Michael J. Cafarella*, Doug Downey, Stephen Soderland, Oren Etzioni

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

79 Scopus citations

Abstract

Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction (IE) distribute their query load over several days, making IE a slow, offline process. This paper introduces a novel architecture for IE that obviates queries to commercial search engines. The architecture is embodied in a system called KNOWITNOW that performs high-precision IE in minutes instead of days. We compare KNOWITNOW experimentally with the previouslypublished KNOWITALL system, and quantify the tradeoff between recall and speed. KNOWITNOW's extraction rate is two to three orders of magnitude higher than KNOWITALL's.

Original languageEnglish (US)
Title of host publicationHLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages563-570
Number of pages8
StatePublished - Dec 1 2005
EventHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT - Vancouver, BC, Canada
Duration: Oct 6 2005Oct 8 2005

Other

OtherHuman Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, HLT/EMNLP 2005, Co-located with the 2005 Document Understanding Conference, DUC and the 9th International Workshop on Parsing Technologies, IWPT
Country/TerritoryCanada
CityVancouver, BC
Period10/6/0510/8/05

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'KNOWITNOW: Fast, scalable information extraction from the web'. Together they form a unique fingerprint.

Cite this