TY - JOUR
T1 - Detecting the Presence of an Individual in Phenotypic Summary Data
AU - Liu, Yongtai
AU - Wan, Zhiyu
AU - Xia, Weiyi
AU - Kantarcioglu, Murat
AU - Vorobeychik, Yevgeniy
AU - Clayton, Ellen Wright
AU - Kho, Abel
AU - Carrell, David
AU - Malin, Bradley A.
PY - 2018
Y1 - 2018
N2 - As the quantity and detail of association studies between clinical phenotypes and genotypes grows, there is a push to make summary statistics widely available. Genome wide summary statistics have been shown to be vulnerable to the inference of a targeted individual's presence. In this paper, we show that presence attacks are feasible with phenome wide summary statistics as well. We use data from three healthcare organizations and an online resource that publishes summary statistics. We introduce a novel attack that achieves over 80% recall and precision within a population of 16,346, where 8,173 individuals are targets. However, the feasibility of the attack is dependent on the attacker's knowledge about 1) the targeted individual and 2) the reference dataset. Within a population of over 2 million, where 8,173 individuals are targets, our attack achieves 31% recall and 17% precision. As a result, it is plausible that sharing of phenomic summary statistics may be accomplished with an acceptable level of privacy risk.
AB - As the quantity and detail of association studies between clinical phenotypes and genotypes grows, there is a push to make summary statistics widely available. Genome wide summary statistics have been shown to be vulnerable to the inference of a targeted individual's presence. In this paper, we show that presence attacks are feasible with phenome wide summary statistics as well. We use data from three healthcare organizations and an online resource that publishes summary statistics. We introduce a novel attack that achieves over 80% recall and precision within a population of 16,346, where 8,173 individuals are targets. However, the feasibility of the attack is dependent on the attacker's knowledge about 1) the targeted individual and 2) the reference dataset. Within a population of over 2 million, where 8,173 individuals are targets, our attack achieves 31% recall and 17% precision. As a result, it is plausible that sharing of phenomic summary statistics may be accomplished with an acceptable level of privacy risk.
UR - http://www.scopus.com/inward/record.url?scp=85062380759&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062380759&partnerID=8YFLogxK
M3 - Article
C2 - 30815118
AN - SCOPUS:85062380759
SN - 1559-4076
VL - 2018
SP - 760
EP - 769
JO - AMIA ... Annual Symposium proceedings. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings. AMIA Symposium
ER -