TY - GEN
T1 - Query planning in the presence of overlapping sources
AU - Bleiholder, Jens
AU - Khuller, Samir
AU - Naumann, Felix
AU - Raschid, Louiqa
AU - Wu, Yao
PY - 2006
Y1 - 2006
N2 - Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the benefit or the cardinality of the target object set (TOS) of objects reached in the result. There is also an evaluation cost of reaching the TOS. We present dual problems in selecting the best set of paths. The first problem is to select a set of paths that satisfy a constraint on the evaluation cost while maximizing the benefit (number of distinct objects in the TOS). The dual problem is to select a set of paths that satisfies a threshold of the TOS benefit with minimal evaluation cost. The two problems can be mapped to the budgeted maximum coverage problem and the maximal set cover with a threshold. To solve these problems, we explore several solutions including greedy heuristics, a randomized search, and a traditional IP/LP formulation with bounds. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the computational overhead of our solutions and their performance compared to the optimal solution.
AB - Navigational queries on Web-accessible life science sources pose unique query optimization challenges. The objects in these sources are interconnected to objects in other sources, forming a large and complex graph, and there is an overlap of objects in the sources. Answering a query requires the traversal of multiple alternate paths through these sources. Each path can be associated with the benefit or the cardinality of the target object set (TOS) of objects reached in the result. There is also an evaluation cost of reaching the TOS. We present dual problems in selecting the best set of paths. The first problem is to select a set of paths that satisfy a constraint on the evaluation cost while maximizing the benefit (number of distinct objects in the TOS). The dual problem is to select a set of paths that satisfies a threshold of the TOS benefit with minimal evaluation cost. The two problems can be mapped to the budgeted maximum coverage problem and the maximal set cover with a threshold. To solve these problems, we explore several solutions including greedy heuristics, a randomized search, and a traditional IP/LP formulation with bounds. We perform experiments on a real-world graph of life sciences objects from NCBI and report on the computational overhead of our solutions and their performance compared to the optimal solution.
UR - http://www.scopus.com/inward/record.url?scp=33745604912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745604912&partnerID=8YFLogxK
U2 - 10.1007/11687238_48
DO - 10.1007/11687238_48
M3 - Conference contribution
AN - SCOPUS:33745604912
SN - 3540329609
SN - 9783540329602
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 811
EP - 828
BT - Advances in Database Technology - EDBT 2006 - 10th International Conference on Extending Database Technology, Proceedings
T2 - 10th International Conference on Extending Database Technology, EDBT 2006
Y2 - 26 March 2006 through 31 March 2006
ER -