Improved extraction assessment through better language models

Arun Ahuja*, Douglas C Downey

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in similar textual contexts. We demonstrate that extraction accuracy depends heavily on the accuracy of the language model utilized to estimate distributional similarity. An unsupervised model selection technique based on this observation is shown to reduce extraction and type-checking error by 26% over previous results, in experiments with Hidden Markov Models. The results suggest that optimizing statistical language models over unlabeled data is a promising direction for improving weakly supervised and unsupervised information extraction.

Original languageEnglish (US)
Title of host publicationNAACL HLT 2010 - Human Language Technologies
Subtitle of host publicationThe 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
Pages225-228
Number of pages4
StatePublished - 2010
Event2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010 - Los Angeles, CA, United States
Duration: Jun 2 2010Jun 4 2010

Publication series

NameNAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference

Other

Other2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010
Country/TerritoryUnited States
CityLos Angeles, CA
Period6/2/106/4/10

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Improved extraction assessment through better language models'. Together they form a unique fingerprint.

Cite this