TY - GEN
T1 - Language models as representations for weakly-supervised NLP tasks
AU - Huang, Fei
AU - Yates, Alexander
AU - Ahuja, Arun
AU - Downey, Douglas C
PY - 2011
Y1 - 2011
N2 - Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates language model representations, in which language models trained on unlabeled corpora are used to generate real-valued feature vectors for words. We investigate ngram models and probabilistic graphical models, including a novel lattice-structured Markov Random Field. Experiments indicate that language model representations outperform traditional representations, and that graphical model representations outperform ngram models, especially on sparse and polysemous words.
AB - Finding the right representation for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This paper investigates language model representations, in which language models trained on unlabeled corpora are used to generate real-valued feature vectors for words. We investigate ngram models and probabilistic graphical models, including a novel lattice-structured Markov Random Field. Experiments indicate that language model representations outperform traditional representations, and that graphical model representations outperform ngram models, especially on sparse and polysemous words.
UR - http://www.scopus.com/inward/record.url?scp=84862272580&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84862272580&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84862272580
SN - 9781932432923
T3 - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
SP - 125
EP - 134
BT - CoNLL 2011 - Fifteenth Conference on Computational Natural Language Learning, Proceedings of the Conference
T2 - 15th Conference on Computational Natural Language Learning, CoNLL 2011
Y2 - 23 June 2011 through 24 June 2011
ER -