Generalized Gini Correlation and its Application in Data-Mining

Yi Gao*, Wenxin Jiang, Martin A Tanner

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

An asymmetric correlation measure commonly used in social economics, called the Gini correlation, is defined between a numerical response and a rank. We generalize the definition of this correlation so that it can be applied to data mining. The new definition, called the generalized Gini correlation, is found to include special cases that are equivalent to common evaluation measures used in data mining, for example, the LIFT measures for a binary response and the expected profit measure for a monetary response. We consider estimation and inference regarding this generalized Gini correlation. The asymptotic distribution of the estimated correlation is derived with the help of some empirical process theory. We consider several ways of constructing confidence intervals and demonstrate their performance numerically. Our paper is interdisciplinary and makes contributions to both the Gini literature and the literature of statistical inference of performance measures in data mining.

Original languageEnglish (US)
Pages (from-to)1455-1479
Number of pages25
JournalData Mining and Knowledge Discovery
Volume30
Issue number6
DOIs
StatePublished - Nov 1 2016

Keywords

  • Asymptotic distribution
  • Confidence interval
  • Data mining
  • Empirical process
  • Gini correlation
  • LIFT measures

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Generalized Gini Correlation and its Application in Data-Mining'. Together they form a unique fingerprint.

Cite this