Design and implementation of a privacy preserving electronic health record linkage tool in Chicago

Abel N. Kho, John P. Cashy, Kathryn L. Jackson, Adam R. Pah, Satyender Goel, Jörn Boehnke, John Eric Humphries, Scott Duke Kominers, Bala N. Hota, Shannon A. Sims, Bradley A. Malin, Dustin D. French, Theresa L. Walunas, David O. Meltzer, Erin O. Kaleba, Roderick C. Jones, William L. Galanter

Research output: Contribution to journalArticle

  • 16 Citations

Abstract

Objective To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. Methods The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. Results The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%. Conclusions Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.

LanguageEnglish (US)
Pages1072-1080
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume22
Issue number5
DOIs
StatePublished - Jan 1 2015

Fingerprint

Electronic Health Records
Privacy
Software
Research
Health Insurance Portability and Accountability Act
Social Security
Names
Public Health
Delivery of Health Care
Weights and Measures
Health

Keywords

  • Health information exchange
  • Privacy protection
  • Record linkage

ASJC Scopus subject areas

  • Health Informatics

Cite this

Kho, Abel N. ; Cashy, John P. ; Jackson, Kathryn L. ; Pah, Adam R. ; Goel, Satyender ; Boehnke, Jörn ; Humphries, John Eric ; Kominers, Scott Duke ; Hota, Bala N. ; Sims, Shannon A. ; Malin, Bradley A. ; French, Dustin D. ; Walunas, Theresa L. ; Meltzer, David O. ; Kaleba, Erin O. ; Jones, Roderick C. ; Galanter, William L./ Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. In: Journal of the American Medical Informatics Association. 2015 ; Vol. 22, No. 5. pp. 1072-1080
@article{827d5c7329d8421e8c3e6cd5b4e16174,
title = "Design and implementation of a privacy preserving electronic health record linkage tool in Chicago",
abstract = "Objective To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. Methods The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. Results The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96\{%} and a specificity of 100\{%}, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28\{%}. Conclusions Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.",
keywords = "Health information exchange, Privacy protection, Record linkage",
author = "Kho, {Abel N.} and Cashy, {John P.} and Jackson, {Kathryn L.} and Pah, {Adam R.} and Satyender Goel and J\{"o}rn Boehnke and Humphries, {John Eric} and Kominers, {Scott Duke} and Hota, {Bala N.} and Sims, {Shannon A.} and Malin, {Bradley A.} and French, {Dustin D.} and Walunas, {Theresa L.} and Meltzer, {David O.} and Kaleba, {Erin O.} and Jones, {Roderick C.} and Galanter, {William L.}",
year = "2015",
month = "1",
day = "1",
doi = "10.1093/jamia/ocv038",
language = "English (US)",
volume = "22",
pages = "1072--1080",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

Kho, AN, Cashy, JP, Jackson, KL, Pah, AR, Goel, S, Boehnke, J, Humphries, JE, Kominers, SD, Hota, BN, Sims, SA, Malin, BA, French, DD, Walunas, TL, Meltzer, DO, Kaleba, EO, Jones, RC & Galanter, WL 2015, 'Design and implementation of a privacy preserving electronic health record linkage tool in Chicago' Journal of the American Medical Informatics Association, vol 22, no. 5, pp. 1072-1080. DOI: 10.1093/jamia/ocv038

Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. / Kho, Abel N.; Cashy, John P.; Jackson, Kathryn L.; Pah, Adam R.; Goel, Satyender; Boehnke, Jörn; Humphries, John Eric; Kominers, Scott Duke; Hota, Bala N.; Sims, Shannon A.; Malin, Bradley A.; French, Dustin D.; Walunas, Theresa L.; Meltzer, David O.; Kaleba, Erin O.; Jones, Roderick C.; Galanter, William L.

In: Journal of the American Medical Informatics Association, Vol. 22, No. 5, 01.01.2015, p. 1072-1080.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Design and implementation of a privacy preserving electronic health record linkage tool in Chicago

AU - Kho,Abel N.

AU - Cashy,John P.

AU - Jackson,Kathryn L.

AU - Pah,Adam R.

AU - Goel,Satyender

AU - Boehnke,Jörn

AU - Humphries,John Eric

AU - Kominers,Scott Duke

AU - Hota,Bala N.

AU - Sims,Shannon A.

AU - Malin,Bradley A.

AU - French,Dustin D.

AU - Walunas,Theresa L.

AU - Meltzer,David O.

AU - Kaleba,Erin O.

AU - Jones,Roderick C.

AU - Galanter,William L.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Objective To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. Methods The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. Results The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%. Conclusions Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.

AB - Objective To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. Methods The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. Results The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%. Conclusions Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.

KW - Health information exchange

KW - Privacy protection

KW - Record linkage

UR - http://www.scopus.com/inward/record.url?scp=84953342589&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953342589&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocv038

DO - 10.1093/jamia/ocv038

M3 - Article

VL - 22

SP - 1072

EP - 1080

JO - Journal of the American Medical Informatics Association : JAMIA

T2 - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -