TY - JOUR
T1 - A qualitative modeling approach for whole genome prediction using high-throughput toxicogenomics data and pathway-based validation
AU - Haider, Saad
AU - Black, Michael B.
AU - Parks, Bethany B.
AU - Foley, Briana
AU - Wetmore, Barbara A.
AU - Andersen, Melvin E.
AU - Clewell, Rebecca A.
AU - Mansouri, Kamel
AU - McMullen, Patrick D.
N1 - Funding Information:
This work was funded through American Chemistry Council’s Long-Range Research Initiative.
Funding Information:
Conflict of Interest Statement: All authors were affiliated with and employed by ScitoVation, LLC. This manuscript is a product of ScitoVation funded by American Chemistry Council’s Long-Range Research Initiative. ScitoVation is not an academic institution. All authors declare no competing interest.
Publisher Copyright:
© 2007 - 2018 Frontiers Media S.A. All Rights Reserved.
PY - 2018/10/2
Y1 - 2018/10/2
N2 - Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation.
AB - Efficient high-throughput transcriptomics (HTT) tools promise inexpensive, rapid assessment of possible biological consequences of human and environmental exposures to tens of thousands of chemicals in commerce. HTT systems have used relatively small sets of gene expression measurements coupled with mathematical prediction methods to estimate genome-wide gene expression and are often trained and validated using pharmaceutical compounds. It is unclear whether these training sets are suitable for general toxicity testing applications and the more diverse chemical space represented by commercial chemicals and environmental contaminants. In this work, we built predictive computational models that inferred whole genome transcriptional profiles from a smaller sample of surrogate genes. The model was trained and validated using a large scale toxicogenomics database with gene expression data from exposure to heterogeneous chemicals from a wide range of classes (the Open TG-GATEs data base). The method of predictor selection was designed to allow high fidelity gene prediction from any pre-existing gene expression data set, regardless of animal species or data measurement platform. Predictive qualitative models were developed with this TG-GATES data that contained gene expression data of human primary hepatocytes with over 941 samples covering 158 compounds. A sequential forward search-based greedy algorithm, combining different fitting approaches and machine learning techniques, was used to find an optimal set of surrogate genes that predicted differential expression changes of the remaining genome. We then used pathway enrichment of up-regulated and down-regulated genes to assess the ability of a limited gene set to determine relevant patterns of tissue response. In addition, we compared prediction performance using the surrogate genes found from our greedy algorithm (referred to as the SV2000) with the landmark genes provided by existing technologies such as L1000 (Genometry) and S1500 (Tox21), finding better predictive performance for the SV2000. The ability of these predictive algorithms to predict pathway level responses is a positive step toward incorporating mode of action (MOA) analysis into the high throughput prioritization and testing of the large number of chemicals in need of safety evaluation.
KW - Cellular mode-of-action
KW - High-throughput toxicogenomics
KW - Pathway enrichment analysis
KW - Predictive toxicology
KW - Whole genome prediction
UR - http://www.scopus.com/inward/record.url?scp=85055312768&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055312768&partnerID=8YFLogxK
U2 - 10.3389/fphar.2018.01072
DO - 10.3389/fphar.2018.01072
M3 - Article
AN - SCOPUS:85055312768
SN - 1663-9812
VL - 9
JO - Frontiers in Pharmacology
JF - Frontiers in Pharmacology
IS - OCT
M1 - 1072
ER -