Abstract
Objective: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. Materials and Methods: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. Results: Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. Conclusions: The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
Original language | English (US) |
---|---|
Article number | 104176 |
Journal | Journal of Biomedical Informatics |
Volume | 134 |
DOIs | |
State | Published - Oct 2022 |
Funding
GMW is supported by National Institutes of Health (NIH)/ National Center for Advancing Translational Sciences (NCATS) UL1TR002541, NIH/NCATS UL1TR000005, NIH/National Library of Medicine (NLM) R01LM013345, NIH/ National Human Genome Research Institute (NHGRI) 3U01HG008685-05S2. YL is supported by NIH/NCATS U01TR003528, and NLM 1R01LM013337. KC is supported by VA MVP000 and CIPHER. NGB is supported by PI18/00981, funded by the Carlos III Health Institute. DAH is supported by NCATS UL1TR002240. MSK is supported by NHGRI 5T32HG002295-18. JHM is supported by NLM 010098. MM is supported by NCATS UL1TR001857. DLM is supported by NIH/NCATS CSTA Award #UL1-TR001878. SNM is supported by NCATS 5UL1TR001857-05 and NHGRI 5R01HG009174-04. GSO is supported by NIH U24CA210867 and P30ES017885. LPP is supported by NCATS Clinical and Translational Science Award (CTSA) Award #UL1TR002366. FSJV is supported by NIH/NCATS UL1TR001881. AMS is supported by NIH/ National Heart, Lung, and Blood Institute (NHLBI) K23HL148394 and L40HL148910, and NIH/NCATS UL1TR001420. SV is supported by NCATS UL1TR001857. ZX is supported by National Institute of Neurological Disorders and Stroke (NINDS) R01NS098023. WY is supported by NIH T32HD040128. IRB Approval was obtained at Assistance Publique - Hôpitaux de Paris, Beth Israel Deaconess Medical Center, Bordeaux University Hospital, Instituti Instituti Clinici Scientifici Maugeri Hospitals, University of Kansas Medical Center, Massachusetts General Brigham, Northwestern University, Medical Center University of Freiburg, University of Pittsburgh, and VA North Atlantic, Southwest, Midwest, Continental and Pacific. An exempt determination was made by Institutional Review Boards at Hospital Universitario 12 de Octubre, University of California Los Angeles, University of Michigan, and University of Pennsylvania. XW, GMW, GAB, ISK, PA, and TC contributed to design and conceptualization of the study. GMW, GAB, YL, MRH, AGS, RB, LC, KC, AD, HE, NGB, RG, DAH, YLH, JHH, JGK, SEM, AM, AM, BM, JHM, MM, DLM, AN, KYN, LPP, MPJ, AP, MJS, FJSV, ERS, PS, PSB, ALMT, VT, PT, SV, ZX, DZ, ISK, and PA contributed to data collection. XW, HGZ, CH, GMW, GAB, CLB, YL, JHH, JGK, SLZ, AM, AM, LPP, AP, AMS, ALMT, BWLT, PT, WY, DZ, ISK, PA, ZG, and TC contributed to data analysis or interpretation. GMW, RB, JHM, DLM, SNM, AMS, WY, ISK, and ZG supplied grant funding for the work. All authors contributed to drafting the work or revising it critically for important intellectual content and approved the final version. All authors are responsible for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
ASJC Scopus subject areas
- Health Informatics
- Computer Science Applications