Methods for assessing replication

Project: Research project

Project Details


This is a level 2 proposal on STEM learning. Replication is a fundamental to the rhetoric of the scientific method, and is part of the logic supporting the claim that science is self-correcting. It is surprising, therefore, that there is so little literature on the methodology for studying replication. The literature on systematic reviewing and meta-analysis is obviously related, but has a somewhat different focus. Meta-analysis and systematic reviewing focus on summarizing a collection of research studies, not on the narrower question of whether the findings from a series of experiments replicate one another. There is no clearly defined and widely accepted definition of a successful replication study or statistical literature providing methodological guidelines on how to design single replication studies or ensembles of replication studies. Despite this ambiguity, there is a growing concern about a replication ‘crisis’ in fields as diverse as medicine and psychology, with a growing number of empirical studies suggesting that published scientific studies do not replicate as well as expected. Moreover, there is evidence that scientists themselves are very concerned about whether scientific research replicates as well as it should. Discussions of the replication crisis are not limited to the scientific literature but have appeared in popular literature (e.g., Newsweek, the Economist, and the New York Times) as well.
The fact that there is no well-established methodology for defining replication and for the statistical analysis of replication studies is demonstrated by two prominent projects of replication studies, the Replication Project in Psychology and the Replication Project in Experimental Economics. Both carried out multiple (100 and 18, respectively) replications of prominent studies in their fields. Yet both admitted that there was no precise definition of replication in their fields or generally accepted statistical analyses for determining whether those definitions had been satisfied, so they used several different ideas. Moreover, no design principles were articulated to support the idea that the ensembles of studies they created were adequate to draw unambiguous conclusions about whether the study findings replicated. Interestingly, subsequent studies suggest that neither project was large enough to be conclusive.
This proposal will address three fundamental problems. The first is how we should define replication: What, precisely, should it mean to say that the results in a collection of k ≥ 2 studies replicate one another? Second, given a definition of replication, what statistical analyses should be done to decide whether the collection of studies replicate one another and what are the properties (e.g., sensitivity or statistical power) of these analyses? Third, how should we design one or more replication studies to provide conclusive answers to questions of replication?
This project has the intellectual merit that it will help formalize subjective ideas about the important concept of replication, provide statistical analyses for evaluating replication studies, provide properties for evaluating the conclusiveness of replication studies, and provide principles for designing conclusive and efficient programs of replication studies. The project has the potential for broader impact on a range of empirical sciences by providing statistical tools to evaluate the replicability of experimental findings, assess the conclusiveness of replication attempts and software tools to help plan programs of replicati
Effective start/end date9/1/188/31/21


  • National Science Foundation (DRL-1841075)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.