Dense subgraphs with restrictions and applications to gene annotation graphs

Barna Saha*, Allison Hoch, Samir Khuller, Louiqa Raschid, Xiao Ning Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

58 Scopus citations

Abstract

In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 14th Annual International Conference, RECOMB 2010, Proceedings
Pages456-472
Number of pages17
DOIs
StatePublished - Dec 23 2010
Event14th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2010 - Lisbon, Portugal
Duration: Apr 25 2010Apr 28 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6044 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2010
CountryPortugal
CityLisbon
Period4/25/104/28/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Dense subgraphs with restrictions and applications to gene annotation graphs'. Together they form a unique fingerprint.

Cite this