Use of a Machine Learning Approach to Impute Gene Expression in African Americans

Project: Research project

Project Details


Multi-omics data has been invaluable in understanding the potential mechanisms behind SNP associations. Using paired genomic and transcriptomic data allows investigators to determine the tissue specific effects of non-coding variation. However, most of this type of data exists for mostly European ancestry populations. Linear models have been developed which that can impute gene expression from genotype data  mostly created from the GTEx resource. This resource contains paired genotype and gene expression data on 44 human tissues. Unfortunately, these models are built mostly on European data; they do not perform as well on African American (AA) cohorts. To alleviate this disparity in both knowledge and data we are proposing to use both or own African American paired data as well as public African American data to create linear and machine learning models to impute gene expression. We will then assess the utility of these models in predicting the risk on venous thromboembolism in our ACCOuNT cohort. By building on our current knowledge of transcriptome imputation, we will be advancing these methods to understudies admixed populations.
Effective start/end date6/10/215/31/23


  • National Human Genome Research Institute (5R21HG011695-02)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.