Clinical text classification with rule-based features and knowledge-guided convolutional neural networks

Liang Yao, Chengsheng Mao, Yuan Luo*

*Corresponding author for this work

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Background: Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods. Methods: In this study, we propose a new approach which combines rule-based features and knowledge-guided deep learning models for effective disease classification. Critical Steps of our method include recognizing trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network (CNN) with word embeddings and Unified Medical Language System (UMLS) entity embeddings. Results: We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results demonstrate that our method outperforms the state-of-the-art methods. Conclusion: We showed that CNN model is powerful for learning effective hidden features, and CUIs embeddings are helpful for building clinical text representations. This shows integrating domain knowledge into CNN models is promising.

Original languageEnglish (US)
Article number71
JournalBMC Medical Informatics and Decision Making
Volume19
DOIs
StatePublished - Apr 4 2019

Fingerprint

Learning
Neural Networks (Computer)
Unified Medical Language System
Natural Language Processing
Informatics
Obesity

Keywords

  • Clinical text classification
  • Convolutional neural networks
  • Entity embeddings
  • Obesity challenge
  • Word embeddings

ASJC Scopus subject areas

  • Health Policy
  • Health Informatics

Cite this

@article{7c798c43938641118e66451ca737451f,
title = "Clinical text classification with rule-based features and knowledge-guided convolutional neural networks",
abstract = "Background: Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods. Methods: In this study, we propose a new approach which combines rule-based features and knowledge-guided deep learning models for effective disease classification. Critical Steps of our method include recognizing trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network (CNN) with word embeddings and Unified Medical Language System (UMLS) entity embeddings. Results: We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results demonstrate that our method outperforms the state-of-the-art methods. Conclusion: We showed that CNN model is powerful for learning effective hidden features, and CUIs embeddings are helpful for building clinical text representations. This shows integrating domain knowledge into CNN models is promising.",
keywords = "Clinical text classification, Convolutional neural networks, Entity embeddings, Obesity challenge, Word embeddings",
author = "Liang Yao and Chengsheng Mao and Yuan Luo",
year = "2019",
month = "4",
day = "4",
doi = "10.1186/s12911-019-0781-4",
language = "English (US)",
volume = "19",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",

}

Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. / Yao, Liang; Mao, Chengsheng; Luo, Yuan.

In: BMC Medical Informatics and Decision Making, Vol. 19, 71, 04.04.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Clinical text classification with rule-based features and knowledge-guided convolutional neural networks

AU - Yao, Liang

AU - Mao, Chengsheng

AU - Luo, Yuan

PY - 2019/4/4

Y1 - 2019/4/4

N2 - Background: Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods. Methods: In this study, we propose a new approach which combines rule-based features and knowledge-guided deep learning models for effective disease classification. Critical Steps of our method include recognizing trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network (CNN) with word embeddings and Unified Medical Language System (UMLS) entity embeddings. Results: We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results demonstrate that our method outperforms the state-of-the-art methods. Conclusion: We showed that CNN model is powerful for learning effective hidden features, and CUIs embeddings are helpful for building clinical text representations. This shows integrating domain knowledge into CNN models is promising.

AB - Background: Clinical text classification is an fundamental problem in medical natural language processing. Existing studies have cocnventionally focused on rules or knowledge sources-based feature engineering, but only a limited number of studies have exploited effective representation learning capability of deep learning methods. Methods: In this study, we propose a new approach which combines rule-based features and knowledge-guided deep learning models for effective disease classification. Critical Steps of our method include recognizing trigger phrases, predicting classes with very few examples using trigger phrases and training a convolutional neural network (CNN) with word embeddings and Unified Medical Language System (UMLS) entity embeddings. Results: We evaluated our method on the 2008 Integrating Informatics with Biology and the Bedside (i2b2) obesity challenge. The results demonstrate that our method outperforms the state-of-the-art methods. Conclusion: We showed that CNN model is powerful for learning effective hidden features, and CUIs embeddings are helpful for building clinical text representations. This shows integrating domain knowledge into CNN models is promising.

KW - Clinical text classification

KW - Convolutional neural networks

KW - Entity embeddings

KW - Obesity challenge

KW - Word embeddings

UR - http://www.scopus.com/inward/record.url?scp=85063970670&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063970670&partnerID=8YFLogxK

U2 - 10.1186/s12911-019-0781-4

DO - 10.1186/s12911-019-0781-4

M3 - Article

C2 - 30943960

VL - 19

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

M1 - 71

ER -