Query planning in the presence of overlapping sources

Jens Bleiholder*, Samir Khuller, Felix Naumann, Louiqa Raschid, Yao Wu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Scopus citations

Abstract

Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the benefit or the cardinality of the target object set (TOS) of objects reached in the result. There is also an evaluation cost of reaching the TOS. We present dual problems in selecting the best set of paths. The first problem is to select a set of paths that satisfy a constraint on the evaluation cost while maximizing the benefit (number of distinct objects in the TOS). The dual problem is to select a set of paths that satisfies a threshold of the TOS benefit with minimal evaluation cost. The two problems can be mapped to the budgeted maximum coverage problem and the maximal set cover with a threshold. To solve these problems, we explore several solutions including greedy heuristics, a randomized search, and a traditional IP/LP formulation with bounds. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the computational overhead of our solutions and their performance compared to the optimal solution.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology - EDBT 2006 - 10th International Conference on Extending Database Technology, Proceedings
Pages811-828
Number of pages18
DOIs
StatePublished - 2006
Event10th International Conference on Extending Database Technology, EDBT 2006 - Munich, Germany
Duration: Mar 26 2006Mar 31 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3896 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Conference on Extending Database Technology, EDBT 2006
Country/TerritoryGermany
CityMunich
Period3/26/063/31/06

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Query planning in the presence of overlapping sources'. Together they form a unique fingerprint.

Cite this