Direct marketers commonly assess their scoring models with a single-split gains chart method: They split the available data into "training" and "test" sets, estimate their models on the training set, apply them to the test set, and generate gains charts. They use the results to compare models (which model should be used), assess overfitting, and estimate how well the mailing will do. It is well known that the results from this approach are highly dependent on the particular split of the data used, due to sampling variation across splits. This paper examines the single-split method. Does the sampling variation across splits affect one's ability to distinguish between superior and inferior models? How can one estimate the overall performance of a mailing accurately? I consider two ways of reducing the variation across splits: Winsorization and stratified sampling. The paper gives an empirical study of these questions and variance-reduction methods using the DMEF data sets.
|Original language||English (US)|
|Number of pages||14|
|Journal||Journal of Interactive Marketing|
|State||Published - Jan 1 2001|
ASJC Scopus subject areas
- Business and International Management