With each successive discovery in genetics, the true dynamic complexity of the human genome has become increasingly apparent, requiring relatively consistent updates to the technical definition of the word “gene”. It is now understood that the notion of “one gene makes one protein that functions in one signaling pathway” in human cells is overly simplistic, because majority of the human genes produce multiple functional products (transcript variants and protein isoforms), through alternative transcription and/or alternative splicing. Therefore, our central hypothesis is that the isoform-level gene products – “transcript variants” and “protein isoforms” are the basic functional units in a mammalian cell, and accordingly, the informatics platforms for managing and analyzing gene regulation data both in normal and disease cells should adopt “gene isoform centric” rather than “gene centric” approaches. Towards the goal of broadly impacting gene regulation and functional studies at gene isoform-level, we have been developing novel algorithms for analyses of genome-wide transcriptome (RNA-seq and exon-array) and protein-DNA binding (ChIP-seq) data, and for extending the gene-level orthology mapping to exon- and transcript-level mapping between the orthologous human and mouse genes. By applying these novel algorithms on public datasets, we have observed significant expression differences between different sample groups (e.g., developmental stages, cancer subtypes, normal vs cancer) for numerous genes at the isoform-level but not at the overall gene-level, and experimentally validated the ‘significant’ isoforms using RT-qPCR in independent bio-specimens. While the application of these algorithms has led to the development of new methods for diagnosis of glioblastoma or a sub-type thereof, the isoform-level transcriptome analyses results also led to some challenging questions – for example – How are the alternative promoters of a gene show switch-like opposing patterns of activity (while one promoter is up- the other is down-regulated in one condition vs the other), and how are different splice-variants of a gene show opposing expression patterns in cancer versus normal tissue samples? We currently lack informatics methods to address these challenging questions. Therefore, we propose to develop novel statistical methods (1) for integrative cluster analysis of isoform-level gene expression information from exon-array and RNA-seq platforms, (2) for identification of differential transcript/isoform usage in heterogeneous cancer samples, and (3) for identification of alternative transcription/splicing quantitative trait locus (sQTL) in tumor adjusted by somatic genetic and epigenetic changes. And, (4) the novel predictions from these algorithms will be experimentally validated by performing Chromatin immunoprecipitation (ChIP), dual-luciferase reporter assay and CRISPR/Cas9 genome editing in U87 and A172 cells. The novel bioinformatics methods developed by this project will help in silico discovery and research for accelerating the linkage of phenotypic and genomic information, at gene-isoform level.
|Effective start/end date||8/1/17 → 4/30/21|
- National Library of Medicine (5R01LM011297-07)
Clustered Regularly Interspaced Short Palindromic Repeats
Quantitative Trait Loci
Gene Expression Profiling