Abstract
People and machines are collecting data at an unprecedented rate. Despite this newfound abundance of data, progress has been slow in sharing it for open science, business, and other data-intensive endeavors. Many such efforts are stymied by privacy concerns and regulatory compliance issues. For example, many hospitals are interested in pooling their medical records for research, but none may disclose arbitrary patient records to researchers or other healthcare providers. In this context we propose the Private Data Network (PDN), a federated database for querying over the collective data of mutually distrustful parties. In a PDN, each member database does not reveal its tuples to its peers nor to the query writer. Instead, the user submits a query to an honest broker that plans and coordinates its execution over multiple private databases using secure multiparty computation (SMC). Here, each database's query execution is oblivious, and its program counters and memory traces are agnostic to the inputs of others. We introduce a framework for executing PDN queries named SMCQL. This system translates SQL statements into SMC primitives to compute query results over the union of its source databases without revealing sensitive information about individual tuples to peer data providers or the honest broker. Only the honest broker and the querier receive the results of a PDN query. For fast, secure query evaluation, we explore a heuristics-driven optimizer that minimizes the PDN's use of secure computation and partitions its query evaluation into scalable slices.
Original language | English (US) |
---|---|
Pages (from-to) | 673-684 |
Number of pages | 12 |
Journal | Proceedings of the VLDB Endowment |
Volume | 10 |
Issue number | 6 |
DOIs | |
State | Published - 2016 |
Event | 43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany Duration: Aug 28 2017 → Sep 1 2017 |
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- General Computer Science