Learning Representations for Weakly Supervised Natural Language Processing Tasks

Fei Huang, Arun Ahuja, Douglas C Downey, Yi Yang, Yuhong Guo, Alexander Yates

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on part-of-speech tagging and information extraction, among other tasks, indicate that features taken from statistical language models, in combination with more traditional features, outperform traditional representations alone, and that graphical model representations outperform n-gram models, especially on sparse and polysemous words.

Original languageEnglish (US)
Pages (from-to)85-120
Number of pages36
JournalComputational Linguistics
Issue number1
StatePublished - 2014

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Learning Representations for Weakly Supervised Natural Language Processing Tasks'. Together they form a unique fingerprint.

Cite this