Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network

Hanyin Wang, Yikuan Li, Seema Ahsan Khan, Yuan Luo*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients’ life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.

Original languageEnglish (US)
Article number101977
JournalArtificial Intelligence In Medicine
Volume110
DOIs
StatePublished - Nov 2020

Keywords

  • Breast cancer
  • Distant recurrence
  • Entity embeddings
  • Knowledge-guided convolutional neural network
  • Word embeddings

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network'. Together they form a unique fingerprint.

Cite this