Aggregate query answering under uncertain schema mappings

Avigdor Gal*, Maria Vanina Martinez, Gerardo I. Simari, V. S. Subrahmanian

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Recent interest in managing uncertainty in data integration has led to the introduction of probabilistic schema mappings and the use of probabilistic methods to answer queries across multiple databases using two semantics: by-table and bytuple. In this paper, we develop three possible semantics for aggregate queries: the range, distribution, and expected value semantics, and show that these three semantics combine with the by-table and by-tuple semantics in six ways. We present algorithms to process COUNT, AVG, SUM, MIN, and MAX queries under all six semantics and develop results on the complexity of processing such queries under all six semantics. We show that computing COUNT is in PTIME for all six semantics and computing SUM is in PTIME for all but the by-tuple/distribution semantics. Finally, we show that AVG, MIN, and MAX are PTIME computable for all by-table semantics and for the by-tuple/range semantics.We developed a prototype implementation and experimented with both real-world traces and simulated data. We show that, as expected, naive processing of aggregates does not scale beyond small databases with a small number of mappings. The results also show that the polynomial time algorithms are scalable up to several million tuples as well as with a large number of mappings.

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
Pages940-951
Number of pages12
DOIs
StatePublished - 2009
Externally publishedYes
Event25th IEEE International Conference on Data Engineering, ICDE 2009 - Shanghai, China
Duration: Mar 29 2009Apr 2 2009

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Conference

Conference25th IEEE International Conference on Data Engineering, ICDE 2009
Country/TerritoryChina
CityShanghai
Period3/29/094/2/09

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Aggregate query answering under uncertain schema mappings'. Together they form a unique fingerprint.

Cite this