A distributional semantics approach to simultaneous recognition of multiple classes of named entities

Siddhartha Jonnalagadda*, Robert Leaman, Trevor Cohen, Graciela Gonzalez

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Named Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. Recent advances in distributional semantics allow us to efficiently create paradigmatic models that encode word order. We used Sahlgren et al's permutation-based variant of the Random Indexing model to create a scalable and efficient system to simultaneously recognize multiple entity classes mentioned in natural language, which is validated on the GENIA corpus which has annotations for 46 biomedical entity classes and supports nested entities. Using distributional semantics features only, it achieves an overall micro-averaged Fmeasure of 67.3% based on fragment matching with performance ranging from 7.4% for "DNA substructure" to 80.7% for "Bioentity".

Original languageEnglish (US)
Title of host publicationComputational Linguistics and Intelligent Text Processing - 11th International Conference, CICLing 2010, Proceedings
Pages224-235
Number of pages12
DOIs
StatePublished - 2010
Externally publishedYes
Event11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010 - Iasi, Romania
Duration: Mar 21 2010Mar 27 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6008 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2010
Country/TerritoryRomania
CityIasi
Period3/21/103/27/10

Keywords

  • Biomedical
  • Classification
  • Distributional
  • Entity
  • GENIA
  • Multiple
  • Named
  • Recognition
  • Semantics

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A distributional semantics approach to simultaneous recognition of multiple classes of named entities'. Together they form a unique fingerprint.

Cite this