Pull to refresh
Logo
Daily Brief
Following
Why Sign Up
Largest-ever reproducibility test finds half of social science claims don't replicate

Largest-ever reproducibility test finds half of social science claims don't replicate

New Capabilities
By Newzino Staff |

Seven-year, 865-researcher project also shows open-science reforms are measurably fixing the problem

Today: SCORE publishes largest-ever reproducibility results in Nature

Overview

For decades, social science findings shaped everything from classroom teaching methods to criminal sentencing guidelines—yet no one had systematically checked whether those findings held up. Now the results are in. A seven-year project involving 865 researchers, nearly 3,900 papers, and 62 journals across 11 disciplines found that only about 55% of published claims successfully replicate, and just 54% of studies are precisely computationally reproducible. The project, called SCORE and funded by the United States Defense Advanced Research Projects Agency (DARPA), is the largest and most comprehensive assessment of research reliability ever conducted.

Why it matters

Half the research informing public policy, education, and medicine may not hold up—but proven fixes exist and are spreading.

Key Indicators

55%
Claims that successfully replicated
Of 274 tested claims from 164 papers, 151 showed statistically significant results matching the originals.
54%
Papers precisely computationally reproducible
Of 600 papers tested, only about half could be exactly reproduced from shared data and code.
91%
Reproducibility when data and code are shared
Approximate reproducibility nearly doubled when original materials were available, up from 73.5% overall.
865
Researchers involved
The SCORE project mobilized researchers worldwide across 11 social and behavioral science disciplines.
24%
Papers with openly shared data
Only 144 of 600 papers published between 2009 and 2018 made their data publicly available.
85%+
Reproducibility in economics and political science
Fields with mandatory data and code sharing policies showed dramatically higher reproducibility.

Interactive

Exploring all sides of a story is often best achieved with Play.

Ever wondered what historical figures would say about today's headlines?

Sign up to generate historical perspectives on this story.

Sign Up

Debate Arena

Two rounds, two personas, one winner. You set the crossfire.

People Involved

Organizations Involved

Timeline

  1. SCORE publishes largest-ever reproducibility results in Nature

    Publication

    Four papers spanning 865 researchers, nearly 3,900 articles, and 62 journals reveal that about 55% of social science claims replicate and 54% are computationally reproducible—but that open-science reforms measurably improve both numbers.

  2. DARPA funds SCORE program with $8 million

    Funding

    The United States Defense Advanced Research Projects Agency begins funding the Systematizing Confidence in Open Research and Evidence program, the largest-ever effort to assess research reliability.

  3. Social Sciences Replication Project finds 62% replication rate

    Publication

    A team led by Colin Camerer replicates 13 of 21 social science experiments from Nature and Science, with effect sizes about half the originals. Published in Nature Human Behaviour.

  4. Reproducibility Project finds only 36% of psychology studies replicate

    Publication

    The first large-scale systematic replication effort, involving 270 researchers testing 100 psychology studies, publishes in Science. The low success rate triggers alarm across the scientific community.

  5. Transparency and Openness Promotion guidelines published

    Policy

    A coalition of journals, funders, and scientific societies publishes eight modular standards for research transparency, giving journals a concrete framework for requiring open data and pre-registration.

  6. Center for Open Science founded

    Institutional

    Brian Nosek and Jeffrey Spies establish the Center for Open Science to build infrastructure for transparent research practices, including the Open Science Framework.

  7. Diederik Stapel fraud scandal breaks

    Investigation

    Dutch social psychologist Diederik Stapel is suspended after colleagues discover he fabricated data across decades of publications. Fifty-eight papers are eventually retracted.

  8. Daryl Bem publishes precognition paper, exposing methodological flaws

    Publication

    A respected psychologist publishes evidence for extrasensory perception in a top journal using standard methods, demonstrating that accepted research practices can produce absurd results.

  9. Ioannidis publishes 'Why Most Published Research Findings Are False'

    Publication

    Stanford statistician John Ioannidis publishes a mathematical proof in PLoS Medicine that most published research findings are likely false, given prevailing study designs and incentives.

Scenarios

1

Open science mandates become universal, reproducibility climbs above 80%

Discussed by: Nature editorial board, Center for Open Science, National Institutes of Health policy analysts

The SCORE data showing 91% reproducibility when data and code are shared becomes the basis for funders and journals to make transparency non-negotiable. The United States National Institutes of Health and National Science Foundation—which already require data management plans—tighten enforcement and tie grant renewals to compliance. Within five years, most major journals adopt mandatory data and code sharing, and reproducibility rates across social science approach the 85% level already seen in economics and political science.

2

Policymakers begin auditing social science behind existing regulations

Discussed by: DARPA program managers, government accountability researchers, congressional staff

DARPA funded SCORE in part because unreliable social science feeds directly into defense and policy decisions. With concrete evidence that roughly half of published claims may not hold, government agencies begin systematic reviews of the research underlying major regulations and programs—particularly in education, criminal justice, and public health. This creates a new category of policy risk: programs built on findings that don't replicate may face defunding or restructuring.

3

Analytical robustness findings reshape how studies are reported

Discussed by: Balazs Aczel's research team, statistical methodology reformers, journal editors

The finding that 81% of studies yield different statistical results depending on equally valid analytical choices forces a rethinking of how results are presented. Rather than reporting a single analysis, journals begin requiring "multiverse analysis"—showing results across the range of defensible analytical decisions. This makes individual papers more honest about uncertainty but also more complex to read, creating tension between transparency and accessibility.

4

Reform stalls as incentive structures resist change

Discussed by: Critics of open science mandates, early-career researchers concerned about workload, university tenure committees

Despite strong evidence that reforms work, the academic incentive structure—which still rewards novel, statistically significant findings published in high-impact journals—proves resistant to change. Data sharing remains burdensome, pre-registration feels restrictive to exploratory researchers, and tenure committees continue to count publications rather than assess reproducibility. Progress continues but slowly, and reproducibility rates plateau in the 60-70% range for most fields.

Historical Context

Ioannidis paper and the birth of metascience (2005)

August 2005

What Happened

Stanford statistician John Ioannidis published a paper in PLoS Medicine titled "Why Most Published Research Findings Are False." Using Bayesian reasoning, he demonstrated that given typical study sizes, effect magnitudes, and the ratio of true to false hypotheses tested, the majority of published positive findings in many fields are likely wrong. The paper has been viewed over 1.8 million times.

Outcome

Short Term

The paper was widely discussed but did not immediately change research practices. Many researchers dismissed the argument as overly pessimistic or inapplicable to their field.

Long Term

It became the intellectual foundation of the replication crisis and the metascience movement—the idea that science itself should be studied scientifically. Every major replication project since, including SCORE, traces its intellectual lineage to this paper.

Why It's Relevant Today

SCORE represents the empirical verification—at massive scale—of what Ioannidis argued theoretically twenty-one years ago. The 55% replication rate is remarkably close to his predictions.

Reproducibility Project: Psychology (2015)

August 2015

What Happened

Brian Nosek and 269 co-authors attempted to replicate 100 published psychology studies. Only 36% produced statistically significant results on replication, and effect sizes were roughly half the originals. Published in Science, the study dominated headlines and forced psychology to confront systemic methodological failures.

Outcome

Short Term

Immediate soul-searching within psychology. Some researchers pushed back, arguing the replication attempts were flawed. Others called for fundamental reform.

Long Term

Catalyzed the open science movement: pre-registration, Registered Reports, data sharing mandates, and the founding of the Society for the Improvement of Psychological Science. Created the template that SCORE scaled across eleven disciplines.

Why It's Relevant Today

SCORE is the direct successor to the 2015 project—same lead investigator, same organization, but vastly expanded in scope. The 2026 replication rate of 55% is notably higher than the 2015 rate of 36%, suggesting either that social science beyond psychology replicates somewhat better, or that methodological improvements in the intervening decade are already visible in the literature.

Cochrane Collaboration and evidence-based medicine (1993–present)

1993–present

What Happened

British epidemiologist Archie Cochrane argued in the 1970s that medical practice should be based on systematic reviews of evidence, not individual studies or clinical intuition. In 1993, the Cochrane Collaboration was founded to produce rigorous systematic reviews of healthcare interventions. It now maintains over 8,000 reviews covering virtually every area of medicine.

Outcome

Short Term

Initially met with resistance from clinicians who saw it as cookbook medicine. Adoption was slow through the 1990s.

Long Term

Evidence-based medicine became the dominant paradigm. Systematic reviews now guide treatment decisions worldwide, and the infrastructure Cochrane built—registries, protocols, standardized methods—became a model for other fields.

Why It's Relevant Today

Medicine faced a similar credibility challenge decades earlier and built institutional infrastructure to address it. The open science movement in social science is following a parallel path: from identifying the problem, to building tools and standards, to changing institutional incentives. SCORE's findings suggest social science is roughly where medicine was in the early 2000s—past denial, building infrastructure, but not yet at universal adoption.

Sources

(11)