SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

Reda Al-Bahrani, Dipendra Jha, Qiao Kang, Sunwoo Lee, Zijiang Yang, Wei Keng Liao, Ankit Agrawal, Alok Choudhary

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine learning models trained on imbalanced datasets tend to produce sub-optimal results. This happens because the learning of the minority classes is dominated by the learning of the majority class. Recommendations to overcome this obstacle include oversampling the minority class by synthesizing new instances and using different performance measures. We propose a novel approach to handle the imbalance in datasets by using a sequence-to-sequence recurrent neural network to synthesize minority class instances. The generative neural network is trained on the minority class instances to learn its data distribution; the generative neural network is then used to synthesize minority class instances; these instances are used to augment the original dataset and balance the minority class. We evaluate our proposed approach against several imbalanced datasets. We train Decision Tree models on the original and augmented datasets and compare their results against the Synthetic Minority Over-sampling TEchnique (SMOTE), Adaptive Synthetic sampling (ADASYN) and Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC). All results are an average of multiple runs and the results are compared across four different performance metrics. SIGRNN performs well compared to SMOTE and ADASYN, specifically in lower percentage increments to the minority class. Also, SIGRNN outperforms SMOTE-NC on datasets having nominal features.

Original languageEnglish (US)
Title of host publicationICPRAM 2021 - Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, Volume 1
EditorsMaria De Marsico, Gabriella Sanniti di Baja, Ana L.N. Fred
PublisherScience and Technology Publications, Lda
Pages349-356
Number of pages8
ISBN (Print)9789897584862
DOIs
StatePublished - 2021
Event10th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2021 - Virtual, Online
Duration: Feb 4 2021Feb 6 2021

Publication series

NameInternational Conference on Pattern Recognition Applications and Methods
Volume1
ISSN (Electronic)2184-4313

Conference

Conference10th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2021
CityVirtual, Online
Period2/4/212/6/21

Keywords

  • Balancing
  • Classification
  • Imbalanced Dataset
  • Oversampling
  • Synthetic Data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network'. Together they form a unique fingerprint.

Cite this