Using clinical narratives and structured data to identify distant recurrences in breast cancer

Zeng Zexian, Roy Ankita, Li Xiaoyu, Espino Sasa, Susan E Clare, Seema Ahsan Khan, Yuan Luo*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages44-52
Number of pages9
ISBN (Electronic)9781538653777
DOIs
StatePublished - Jul 24 2018
Event6th IEEE International Conference on Healthcare Informatics, ICHI 2018 - New York, United States
Duration: Jun 4 2018Jun 7 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

Other

Other6th IEEE International Conference on Healthcare Informatics, ICHI 2018
CountryUnited States
CityNew York
Period6/4/186/7/18

Fingerprint

Breast Neoplasms
Electronic Health Records
Recurrence
Health
Area Under Curve
Secondary Care
Support vector machines

Keywords

  • Breast cancer
  • Computational phenotyping
  • Distant recurrence
  • EHR
  • Metastasis

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Health Informatics

Cite this

Zexian, Z., Ankita, R., Xiaoyu, L., Sasa, E., Clare, S. E., Khan, S. A., & Luo, Y. (2018). Using clinical narratives and structured data to identify distant recurrences in breast cancer. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018 (pp. 44-52). [8419346] (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICHI.2018.00013
Zexian, Zeng ; Ankita, Roy ; Xiaoyu, Li ; Sasa, Espino ; Clare, Susan E ; Khan, Seema Ahsan ; Luo, Yuan. / Using clinical narratives and structured data to identify distant recurrences in breast cancer. Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 44-52 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018).
@inproceedings{1ea2ca487b574e9cb8c9553e9c474de5,
title = "Using clinical narratives and structured data to identify distant recurrences in breast cancer",
abstract = "Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.",
keywords = "Breast cancer, Computational phenotyping, Distant recurrence, EHR, Metastasis",
author = "Zeng Zexian and Roy Ankita and Li Xiaoyu and Espino Sasa and Clare, {Susan E} and Khan, {Seema Ahsan} and Yuan Luo",
year = "2018",
month = "7",
day = "24",
doi = "10.1109/ICHI.2018.00013",
language = "English (US)",
series = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "44--52",
booktitle = "Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018",
address = "United States",

}

Zexian, Z, Ankita, R, Xiaoyu, L, Sasa, E, Clare, SE, Khan, SA & Luo, Y 2018, Using clinical narratives and structured data to identify distant recurrences in breast cancer. in Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018., 8419346, Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018, Institute of Electrical and Electronics Engineers Inc., pp. 44-52, 6th IEEE International Conference on Healthcare Informatics, ICHI 2018, New York, United States, 6/4/18. https://doi.org/10.1109/ICHI.2018.00013

Using clinical narratives and structured data to identify distant recurrences in breast cancer. / Zexian, Zeng; Ankita, Roy; Xiaoyu, Li; Sasa, Espino; Clare, Susan E; Khan, Seema Ahsan; Luo, Yuan.

Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 44-52 8419346 (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Using clinical narratives and structured data to identify distant recurrences in breast cancer

AU - Zexian, Zeng

AU - Ankita, Roy

AU - Xiaoyu, Li

AU - Sasa, Espino

AU - Clare, Susan E

AU - Khan, Seema Ahsan

AU - Luo, Yuan

PY - 2018/7/24

Y1 - 2018/7/24

N2 - Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.

AB - Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.

KW - Breast cancer

KW - Computational phenotyping

KW - Distant recurrence

KW - EHR

KW - Metastasis

UR - http://www.scopus.com/inward/record.url?scp=85051118308&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051118308&partnerID=8YFLogxK

U2 - 10.1109/ICHI.2018.00013

DO - 10.1109/ICHI.2018.00013

M3 - Conference contribution

AN - SCOPUS:85051118308

T3 - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

SP - 44

EP - 52

BT - Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Zexian Z, Ankita R, Xiaoyu L, Sasa E, Clare SE, Khan SA et al. Using clinical narratives and structured data to identify distant recurrences in breast cancer. In Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 44-52. 8419346. (Proceedings - 2018 IEEE International Conference on Healthcare Informatics, ICHI 2018). https://doi.org/10.1109/ICHI.2018.00013