TY - GEN
T1 - Improved extraction assessment through better language models
AU - Ahuja, Arun
AU - Downey, Douglas C
PY - 2010
Y1 - 2010
N2 - A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in similar textual contexts. We demonstrate that extraction accuracy depends heavily on the accuracy of the language model utilized to estimate distributional similarity. An unsupervised model selection technique based on this observation is shown to reduce extraction and type-checking error by 26% over previous results, in experiments with Hidden Markov Models. The results suggest that optimizing statistical language models over unlabeled data is a promising direction for improving weakly supervised and unsupervised information extraction.
AB - A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in similar textual contexts. We demonstrate that extraction accuracy depends heavily on the accuracy of the language model utilized to estimate distributional similarity. An unsupervised model selection technique based on this observation is shown to reduce extraction and type-checking error by 26% over previous results, in experiments with Hidden Markov Models. The results suggest that optimizing statistical language models over unlabeled data is a promising direction for improving weakly supervised and unsupervised information extraction.
UR - http://www.scopus.com/inward/record.url?scp=84858433233&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858433233&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84858433233
SN - 1932432655
SN - 9781932432657
T3 - NAACL HLT 2010 - Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Proceedings of the Main Conference
SP - 225
EP - 228
BT - NAACL HLT 2010 - Human Language Technologies
T2 - 2010 Human Language Technologies Conference ofthe North American Chapter of the Association for Computational Linguistics, NAACL HLT 2010
Y2 - 2 June 2010 through 4 June 2010
ER -