On Asymptotic Distributions and Confidence Intervals for LIFT Measures in Data Mining

Wenxin Jiang, Yu Zhao

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. In this article, we study how to construct confidence intervals of the LIFT measures. We point out the subtlety of this task and explain how simple binomial confidence intervals can have incorrect coverage probabilities, due to omitting variation from the sample percentile of the scoring rule. We derive the asymptotic distribution using some advanced empirical process theory and the functional delta method in the Appendix. The additional variation is shown to be related to a conditional mean response, which can be estimated by a local averaging of the responses over the scores from the validation data. Alternatively, a subsampling method is shown to provide a valid confidence interval, without needing to estimate the conditional mean response. Numerical experiments are conducted to compare these different methods regarding the coverage probabilities and the lengths of the resulting confidence intervals.

Original languageEnglish (US)
Pages (from-to)1717-1725
Number of pages9
JournalJournal of the American Statistical Association
Volume110
Issue number512
DOIs
StatePublished - Oct 2 2015

Keywords

  • %response
  • Empirical process
  • Functional delta method
  • Subsampling
  • Validation data

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'On Asymptotic Distributions and Confidence Intervals for LIFT Measures in Data Mining'. Together they form a unique fingerprint.

Cite this