TY - JOUR
T1 - Development of an optical character recognition pipeline for handwritten form fields from an electronic health record
AU - Rasmussen, Luke V.
AU - Peissig, Peggy L.
AU - McCarty, Catherine A.
AU - Starren, Justin B
PY - 2012/6
Y1 - 2012/6
N2 - Background: Although the penetration of electronic health records is increasing rapidly, much of the historical medical record is only available in handwritten notes and forms, which require labor-intensive, human chart abstraction for some clinical research. The few previous studies on automated extraction of data from these handwritten notes have focused on monolithic, custom-developed recognition systems or third-party systems that require proprietary forms. Methods: We present an optical character recognition processing pipeline, which leverages the capabilities of existing third-party optical character recognition engines, and provides the flexibility offered by a modular custom-developed system. The system was configured and run on a selected set of form fields extracted from a corpus of handwritten ophthalmology forms. Observations: The processing pipeline allowed multiple configurations to be run, with the optimal configuration consisting of the Nuance and LEADTOOLS engines running in parallel with a positive predictive value of 94.6% and a sensitivity of 13.5%. Discussion: While limitations exist, preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.
AB - Background: Although the penetration of electronic health records is increasing rapidly, much of the historical medical record is only available in handwritten notes and forms, which require labor-intensive, human chart abstraction for some clinical research. The few previous studies on automated extraction of data from these handwritten notes have focused on monolithic, custom-developed recognition systems or third-party systems that require proprietary forms. Methods: We present an optical character recognition processing pipeline, which leverages the capabilities of existing third-party optical character recognition engines, and provides the flexibility offered by a modular custom-developed system. The system was configured and run on a selected set of form fields extracted from a corpus of handwritten ophthalmology forms. Observations: The processing pipeline allowed multiple configurations to be run, with the optimal configuration consisting of the Nuance and LEADTOOLS engines running in parallel with a positive predictive value of 94.6% and a sensitivity of 13.5%. Discussion: While limitations exist, preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.
UR - http://www.scopus.com/inward/record.url?scp=84863539091&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863539091&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2011-000182
DO - 10.1136/amiajnl-2011-000182
M3 - Article
C2 - 21890871
AN - SCOPUS:84863539091
SN - 1067-5027
VL - 19
SP - e90-e95
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - E1
ER -