Detecting the Presence of an Individual in Phenotypic Summary Data

Yongtai Liu, Zhiyu Wan, Weiyi Xia, Murat Kantarcioglu, Yevgeniy Vorobeychik, Ellen Wright Clayton, Abel Kho, David Carrell, Bradley A. Malin

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


As the quantity and detail of association studies between clinical phenotypes and genotypes grows, there is a push to make summary statistics widely available. Genome wide summary statistics have been shown to be vulnerable to the inference of a targeted individual's presence. In this paper, we show that presence attacks are feasible with phenome wide summary statistics as well. We use data from three healthcare organizations and an online resource that publishes summary statistics. We introduce a novel attack that achieves over 80% recall and precision within a population of 16,346, where 8,173 individuals are targets. However, the feasibility of the attack is dependent on the attacker's knowledge about 1) the targeted individual and 2) the reference dataset. Within a population of over 2 million, where 8,173 individuals are targets, our attack achieves 31% recall and 17% precision. As a result, it is plausible that sharing of phenomic summary statistics may be accomplished with an acceptable level of privacy risk.

Original languageEnglish (US)
Pages (from-to)760-769
Number of pages10
JournalAMIA ... Annual Symposium proceedings. AMIA Symposium
StatePublished - 2018

ASJC Scopus subject areas

  • Medicine(all)


Dive into the research topics of 'Detecting the Presence of an Individual in Phenotypic Summary Data'. Together they form a unique fingerprint.

Cite this