TY - JOUR
T1 - PheKB
T2 - A catalog and workflow for creating electronic phenotype algorithms for transportability
AU - Kirby, Jacqueline C.
AU - Speltz, Peter
AU - Rasmussen, Luke V.
AU - Basford, Melissa
AU - Gottesman, Omri
AU - Peissig, Peggy L.
AU - Pacheco, Jennifer A.
AU - Tromp, Gerard
AU - Pathak, Jyotishman
AU - Carrell, David S.
AU - Ellis, Stephen B.
AU - Lingren, Todd
AU - Thompson, Will K.
AU - Savova, Guergana
AU - Haines, Jonathan
AU - Roden, Dan M.
AU - Harris, Paul A.
AU - Denny, Joshua C.
N1 - Funding Information:
We gratefully acknowledge Stephanie Pretel (NIH/NLM/NCBI) for assistance and discussion for the development of the Data Dictionary/Data Validation. This study was supported by the National Human Genomic Research Institute, grant numbers: U01HG006828 (Cincinnati Children's Hospital Medical Center/Harvard), U01HG006830 (Children's Hospital of Philadelphia), U01HG006389 (Essentia Institute of Rural Health), U01HG006382 (Geisinger Clinic), U01HG006375 (Group Health Cooperative), U01HG06379 (Mayo Clinic), U01HG006380 (Mount Sinai School of Medicine), U01HG006388 (Northwestern University), U01HG006378 (Vanderbilt University), and U01HG006385 (Vanderbilt University serving as the Coordinating Center). Additionally, this study was supported by the National Institute of General Medical Sciences, grant numbers R01GM105688 and R01GM103859.
Publisher Copyright:
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.
PY - 2016/11/1
Y1 - 2016/11/1
N2 - Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
AB - Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
KW - Clinical research
KW - Electronic health records
KW - Electronic phenotyping
KW - Genomic research
KW - Natural language processing
UR - http://www.scopus.com/inward/record.url?scp=84994697920&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994697920&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocv202
DO - 10.1093/jamia/ocv202
M3 - Article
C2 - 27026615
AN - SCOPUS:84994697920
SN - 1067-5027
VL - 23
SP - 1046
EP - 1052
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 6
ER -