Rich text formatted EHR narratives: A hidden and ignored trove

Zexian Zeng, Yuan Zhao, Mengxin Sun, Andy H. Vo, Justin B Starren, Yuan Luo*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


This study presents an approach for mining structured information from clinical narratives in Electronic Health Records (EHRs) by using Rich Text Formatted (RTF) records. RTF is adopted by many medical information management systems. There is rich structural information in these files which can be extracted and interpreted, yet such information is largely ignored. We investigate multiple types of EHR narratives in the Enterprise Data Warehouse from a multisite large healthcare chain consisting of both, an academic medical center and community hospitals. We focus on the RTF constructs related to tables and sections that are not available in plain text EHR narratives. We show how to parse these RTF constructs, analyze their prevalence and characteristics in the context of multiple types of EHR narratives. Our case study demonstrates the additional utility of the features derived from RTF constructs over plain text oriented NLP.

Original languageEnglish (US)
Title of host publicationMEDINFO 2019
Subtitle of host publicationHealth and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics
EditorsBrigitte Seroussi, Lucila Ohno-Machado, Lucila Ohno-Machado, Brigitte Seroussi
PublisherIOS Press
Number of pages5
ISBN (Electronic)9781643680026
StatePublished - Aug 21 2019
Event17th World Congress on Medical and Health Informatics, MEDINFO 2019 - Lyon, France
Duration: Aug 25 2019Aug 30 2019

Publication series

NameStudies in Health Technology and Informatics
ISSN (Print)0926-9630
ISSN (Electronic)1879-8365


Conference17th World Congress on Medical and Health Informatics, MEDINFO 2019


  • Electronic Health Records
  • Information Management
  • Natural Language Processing

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management


Dive into the research topics of 'Rich text formatted EHR narratives: A hidden and ignored trove'. Together they form a unique fingerprint.

Cite this