Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics

Kae Sawada, Michael W. Clark, Nabil Alshurafa, Mohammad Pourhomayoun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we utilize Big Data Processing and develop Predictive Analytics Models to examine and analyze mutations associated with osteoporosis and cardiovascular disease. The dataset consists of the genomic information of over 2,500 individuals. The genomic data was collected from all around the world. The data visualization allowed us to see geographical/regional clustering patterns in the above mentioned specific mutations. The visualized data clearly shows a high correlation between a person's regional background and the occurrence of the 35 single nucleotide polymorphisms (SNPs). The 35 SNPs are specifically associated with osteoporosis and/or cardiovascular disease (CVD). A predictive analytics model was developed based on machine learning algorithms to predict the risk of an individual manifesting osteoporosis in later life. The results of this predictive model confirmed the links between osteoporosis and Cardiovascular related parameters such as High Density Lipoprotein (HDL) and Systolic Blood Pressure (SBP), as determined by the preceding studies.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017
EditorsFernando G. Tinetti, Quoc-Nam Tran, Leonidas Deligiannidis, Mary Qu Yang, Mary Qu Yang, Hamid R. Arabnia
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1459-1464
Number of pages6
ISBN (Electronic)9781538626528
DOIs
StatePublished - Dec 4 2018
Event2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017 - Las Vegas, United States
Duration: Dec 14 2017Dec 16 2017

Publication series

NameProceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017

Other

Other2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017
CountryUnited States
CityLas Vegas
Period12/14/1712/16/17

Fingerprint

Nucleotides
Polymorphism
Lipoproteins
Data visualization
Blood pressure
Learning algorithms
Learning systems
Predictive analytics
Big data

Keywords

  • 1000 Genome Project
  • Classifiers
  • Clustering
  • Data Visualization
  • Genome Wide Association Study (GWAS)
  • Machine Learning
  • Predictive Model
  • Supervised Learning
  • osteoporosis

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Safety, Risk, Reliability and Quality

Cite this

Sawada, K., Clark, M. W., Alshurafa, N., & Pourhomayoun, M. (2018). Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics. In F. G. Tinetti, Q-N. Tran, L. Deligiannidis, M. Q. Yang, M. Q. Yang, & H. R. Arabnia (Eds.), Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017 (pp. 1459-1464). [8561018] (Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CSCI.2017.255
Sawada, Kae ; Clark, Michael W. ; Alshurafa, Nabil ; Pourhomayoun, Mohammad. / Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics. Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017. editor / Fernando G. Tinetti ; Quoc-Nam Tran ; Leonidas Deligiannidis ; Mary Qu Yang ; Mary Qu Yang ; Hamid R. Arabnia. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1459-1464 (Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017).
@inproceedings{c1d255887acd48d38c2800a1ec77bfab,
title = "Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics",
abstract = "In this paper, we utilize Big Data Processing and develop Predictive Analytics Models to examine and analyze mutations associated with osteoporosis and cardiovascular disease. The dataset consists of the genomic information of over 2,500 individuals. The genomic data was collected from all around the world. The data visualization allowed us to see geographical/regional clustering patterns in the above mentioned specific mutations. The visualized data clearly shows a high correlation between a person's regional background and the occurrence of the 35 single nucleotide polymorphisms (SNPs). The 35 SNPs are specifically associated with osteoporosis and/or cardiovascular disease (CVD). A predictive analytics model was developed based on machine learning algorithms to predict the risk of an individual manifesting osteoporosis in later life. The results of this predictive model confirmed the links between osteoporosis and Cardiovascular related parameters such as High Density Lipoprotein (HDL) and Systolic Blood Pressure (SBP), as determined by the preceding studies.",
keywords = "1000 Genome Project, Classifiers, Clustering, Data Visualization, Genome Wide Association Study (GWAS), Machine Learning, Predictive Model, Supervised Learning, osteoporosis",
author = "Kae Sawada and Clark, {Michael W.} and Nabil Alshurafa and Mohammad Pourhomayoun",
year = "2018",
month = "12",
day = "4",
doi = "10.1109/CSCI.2017.255",
language = "English (US)",
series = "Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1459--1464",
editor = "Tinetti, {Fernando G.} and Quoc-Nam Tran and Leonidas Deligiannidis and Yang, {Mary Qu} and Yang, {Mary Qu} and Arabnia, {Hamid R.}",
booktitle = "Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017",
address = "United States",

}

Sawada, K, Clark, MW, Alshurafa, N & Pourhomayoun, M 2018, Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics. in FG Tinetti, Q-N Tran, L Deligiannidis, MQ Yang, MQ Yang & HR Arabnia (eds), Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017., 8561018, Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017, Institute of Electrical and Electronics Engineers Inc., pp. 1459-1464, 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017, Las Vegas, United States, 12/14/17. https://doi.org/10.1109/CSCI.2017.255

Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics. / Sawada, Kae; Clark, Michael W.; Alshurafa, Nabil; Pourhomayoun, Mohammad.

Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017. ed. / Fernando G. Tinetti; Quoc-Nam Tran; Leonidas Deligiannidis; Mary Qu Yang; Mary Qu Yang; Hamid R. Arabnia. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1459-1464 8561018 (Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics

AU - Sawada, Kae

AU - Clark, Michael W.

AU - Alshurafa, Nabil

AU - Pourhomayoun, Mohammad

PY - 2018/12/4

Y1 - 2018/12/4

N2 - In this paper, we utilize Big Data Processing and develop Predictive Analytics Models to examine and analyze mutations associated with osteoporosis and cardiovascular disease. The dataset consists of the genomic information of over 2,500 individuals. The genomic data was collected from all around the world. The data visualization allowed us to see geographical/regional clustering patterns in the above mentioned specific mutations. The visualized data clearly shows a high correlation between a person's regional background and the occurrence of the 35 single nucleotide polymorphisms (SNPs). The 35 SNPs are specifically associated with osteoporosis and/or cardiovascular disease (CVD). A predictive analytics model was developed based on machine learning algorithms to predict the risk of an individual manifesting osteoporosis in later life. The results of this predictive model confirmed the links between osteoporosis and Cardiovascular related parameters such as High Density Lipoprotein (HDL) and Systolic Blood Pressure (SBP), as determined by the preceding studies.

AB - In this paper, we utilize Big Data Processing and develop Predictive Analytics Models to examine and analyze mutations associated with osteoporosis and cardiovascular disease. The dataset consists of the genomic information of over 2,500 individuals. The genomic data was collected from all around the world. The data visualization allowed us to see geographical/regional clustering patterns in the above mentioned specific mutations. The visualized data clearly shows a high correlation between a person's regional background and the occurrence of the 35 single nucleotide polymorphisms (SNPs). The 35 SNPs are specifically associated with osteoporosis and/or cardiovascular disease (CVD). A predictive analytics model was developed based on machine learning algorithms to predict the risk of an individual manifesting osteoporosis in later life. The results of this predictive model confirmed the links between osteoporosis and Cardiovascular related parameters such as High Density Lipoprotein (HDL) and Systolic Blood Pressure (SBP), as determined by the preceding studies.

KW - 1000 Genome Project

KW - Classifiers

KW - Clustering

KW - Data Visualization

KW - Genome Wide Association Study (GWAS)

KW - Machine Learning

KW - Predictive Model

KW - Supervised Learning

KW - osteoporosis

UR - http://www.scopus.com/inward/record.url?scp=85060653936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060653936&partnerID=8YFLogxK

U2 - 10.1109/CSCI.2017.255

DO - 10.1109/CSCI.2017.255

M3 - Conference contribution

AN - SCOPUS:85060653936

T3 - Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017

SP - 1459

EP - 1464

BT - Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017

A2 - Tinetti, Fernando G.

A2 - Tran, Quoc-Nam

A2 - Deligiannidis, Leonidas

A2 - Yang, Mary Qu

A2 - Yang, Mary Qu

A2 - Arabnia, Hamid R.

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Sawada K, Clark MW, Alshurafa N, Pourhomayoun M. Analyzing the Mutation Frequencies and Correlation of Genetic Diseases in Worldwide Populations Using Big Data Processing, Clustering, and Predictive Analytics. In Tinetti FG, Tran Q-N, Deligiannidis L, Yang MQ, Yang MQ, Arabnia HR, editors, Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1459-1464. 8561018. (Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017). https://doi.org/10.1109/CSCI.2017.255