Abstract
Even in a massive corpus such as the Web, a substantial fraction of extractions appear infrequently. This paper shows how to assess the correctness of sparse extractions by utilizing unsupervised language models. The REALM system, which combines HMM-based and n-gram-based language models, ranks candidate extractions by the likelihood that they are correct. Our experiments show that REALM reduces extraction error by 39%, on average, when compared with previous work. Because REALM pre-computes language models based on its corpus and does not require any hand-tagged seeds, it is far more scalable than approaches that learn models for each individual relation from hand-tagged data. Thus, REALM is ideally suited for open information extraction where the relations of interest are not specified in advance and their number is potentially vast.
Original language | English (US) |
---|---|
Title of host publication | ACL 2007 - Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics |
Pages | 696-703 |
Number of pages | 8 |
State | Published - Dec 1 2007 |
Event | 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 - Prague, Czech Republic Duration: Jun 23 2007 → Jun 30 2007 |
Other
Other | 45th Annual Meeting of the Association for Computational Linguistics, ACL 2007 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 6/23/07 → 6/30/07 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language