Largest-ever reproducibility test finds half of social science claims don't replicate

Overview

For decades, social science findings shaped everything from classroom teaching methods to criminal sentencing guidelines—yet no one had systematically checked whether those findings held up. Now the results are in. A seven-year project involving 865 researchers, nearly 3,900 papers, and 62 journals across 11 disciplines found that only about 55% of published claims successfully replicate, and just 54% of studies are precisely computationally reproducible. The project, called SCORE and funded by the United States Defense Advanced Research Projects Agency (DARPA), is the largest and most comprehensive assessment of research reliability ever conducted.

Why it matters

Half the research informing public policy, education, and medicine may not hold up—but proven fixes exist and are spreading.

Key Indicators

55%

Claims that successfully replicated

Of 274 tested claims from 164 papers, 151 showed statistically significant results matching the originals.

54%

Papers precisely computationally reproducible

Of 600 papers tested, only about half could be exactly reproduced from shared data and code.

91%

Reproducibility when data and code are shared

Approximate reproducibility nearly doubled when original materials were available, up from 73.5% overall.

865

Researchers involved

The SCORE project mobilized researchers worldwide across 11 social and behavioral science disciplines.

24%

Papers with openly shared data

Only 144 of 600 papers published between 2009 and 2018 made their data publicly available.

85%+

Reproducibility in economics and political science

Fields with mandatory data and code sharing policies showed dramatically higher reproducibility.

Interactive

Exploring all sides of a story is often best achieved with Play.

Ever wondered what historical figures would say about today's headlines?

Debate Arena

Two rounds, two personas, one winner. You set the crossfire.

People Involved

Brian Nosek

Principal investigator on SCORE; co-author of the 2026 commentary

John Ioannidis

Foundational figure in metascience; not directly involved in SCORE

Balazs Aczel

Led the SCORE analytical robustness study

Gustav Nilsonne

Co-authored all three main SCORE papers

Organizations Involved

Center for Open Science

Nonprofit Research Organization

Lead coordinating body for the SCORE program

The nonprofit behind the Open Science Framework and most major replication initiatives, including SCORE.

Defense Advanced Research Projects Agency (DARPA)

United States Federal Agency

Funder of the SCORE program

The Pentagon's advanced research arm funded SCORE with approximately $8 million, reflecting a national security interest in knowing which social science findings are actually reliable.

Timeline

SCORE publishes largest-ever reproducibility results in Nature
Publication

Four papers spanning 865 researchers, nearly 3,900 articles, and 62 journals reveal that about 55% of social science claims replicate and 54% are computationally reproducible—but that open-science reforms measurably improve both numbers.
Today
DARPA funds SCORE program with $8 million
Funding

The United States Defense Advanced Research Projects Agency begins funding the Systematizing Confidence in Open Research and Evidence program, the largest-ever effort to assess research reliability.
February 1st, 2019
Social Sciences Replication Project finds 62% replication rate
Publication

A team led by Colin Camerer replicates 13 of 21 social science experiments from Nature and Science, with effect sizes about half the originals. Published in Nature Human Behaviour.
August 27th, 2018
Reproducibility Project finds only 36% of psychology studies replicate
Publication

The first large-scale systematic replication effort, involving 270 researchers testing 100 psychology studies, publishes in Science. The low success rate triggers alarm across the scientific community.
August 27th, 2015
Transparency and Openness Promotion guidelines published
Policy

A coalition of journals, funders, and scientific societies publishes eight modular standards for research transparency, giving journals a concrete framework for requiring open data and pre-registration.
June 25th, 2015
Center for Open Science founded
Institutional

Brian Nosek and Jeffrey Spies establish the Center for Open Science to build infrastructure for transparent research practices, including the Open Science Framework.
January 1st, 2013
Diederik Stapel fraud scandal breaks
Investigation

Dutch social psychologist Diederik Stapel is suspended after colleagues discover he fabricated data across decades of publications. Fifty-eight papers are eventually retracted.
September 1st, 2011
Daryl Bem publishes precognition paper, exposing methodological flaws
Publication

A respected psychologist publishes evidence for extrasensory perception in a top journal using standard methods, demonstrating that accepted research practices can produce absurd results.
January 1st, 2011
Ioannidis publishes 'Why Most Published Research Findings Are False'
Publication

Stanford statistician John Ioannidis publishes a mathematical proof in PLoS Medicine that most published research findings are likely false, given prevailing study designs and incentives.
August 30th, 2005

Scenarios

Open science mandates become universal, reproducibility climbs above 80%

Discussed by: Nature editorial board, Center for Open Science, National Institutes of Health policy analysts

The SCORE data showing 91% reproducibility when data and code are shared becomes the basis for funders and journals to make transparency non-negotiable. The United States National Institutes of Health and National Science Foundation—which already require data management plans—tighten enforcement and tie grant renewals to compliance. Within five years, most major journals adopt mandatory data and code sharing, and reproducibility rates across social science approach the 85% level already seen in economics and political science.

Policymakers begin auditing social science behind existing regulations

Discussed by: DARPA program managers, government accountability researchers, congressional staff

DARPA funded SCORE in part because unreliable social science feeds directly into defense and policy decisions. With concrete evidence that roughly half of published claims may not hold, government agencies begin systematic reviews of the research underlying major regulations and programs—particularly in education, criminal justice, and public health. This creates a new category of policy risk: programs built on findings that don't replicate may face defunding or restructuring.

Analytical robustness findings reshape how studies are reported

Discussed by: Balazs Aczel's research team, statistical methodology reformers, journal editors

The finding that 81% of studies yield different statistical results depending on equally valid analytical choices forces a rethinking of how results are presented. Rather than reporting a single analysis, journals begin requiring "multiverse analysis"—showing results across the range of defensible analytical decisions. This makes individual papers more honest about uncertainty but also more complex to read, creating tension between transparency and accessibility.

Reform stalls as incentive structures resist change

Discussed by: Critics of open science mandates, early-career researchers concerned about workload, university tenure committees

Despite strong evidence that reforms work, the academic incentive structure—which still rewards novel, statistically significant findings published in high-impact journals—proves resistant to change. Data sharing remains burdensome, pre-registration feels restrictive to exploratory researchers, and tenure committees continue to count publications rather than assess reproducibility. Progress continues but slowly, and reproducibility rates plateau in the 60-70% range for most fields.

Historical Context

Ioannidis paper and the birth of metascience (2005)

August 2005

What Happened

Stanford statistician John Ioannidis published a paper in PLoS Medicine titled "Why Most Published Research Findings Are False." Using Bayesian reasoning, he demonstrated that given typical study sizes, effect magnitudes, and the ratio of true to false hypotheses tested, the majority of published positive findings in many fields are likely wrong. The paper has been viewed over 1.8 million times.

Outcome

Short Term

The paper was widely discussed but did not immediately change research practices. Many researchers dismissed the argument as overly pessimistic or inapplicable to their field.

Long Term

It became the intellectual foundation of the replication crisis and the metascience movement—the idea that science itself should be studied scientifically. Every major replication project since, including SCORE, traces its intellectual lineage to this paper.

Why It's Relevant Today

SCORE represents the empirical verification—at massive scale—of what Ioannidis argued theoretically twenty-one years ago. The 55% replication rate is remarkably close to his predictions.

Reproducibility Project: Psychology (2015)

August 2015

What Happened

Brian Nosek and 269 co-authors attempted to replicate 100 published psychology studies. Only 36% produced statistically significant results on replication, and effect sizes were roughly half the originals. Published in Science, the study dominated headlines and forced psychology to confront systemic methodological failures.

Outcome

Short Term

Immediate soul-searching within psychology. Some researchers pushed back, arguing the replication attempts were flawed. Others called for fundamental reform.

Long Term

Catalyzed the open science movement: pre-registration, Registered Reports, data sharing mandates, and the founding of the Society for the Improvement of Psychological Science. Created the template that SCORE scaled across eleven disciplines.

Why It's Relevant Today

SCORE is the direct successor to the 2015 project—same lead investigator, same organization, but vastly expanded in scope. The 2026 replication rate of 55% is notably higher than the 2015 rate of 36%, suggesting either that social science beyond psychology replicates somewhat better, or that methodological improvements in the intervening decade are already visible in the literature.

Cochrane Collaboration and evidence-based medicine (1993–present)

1993–present

What Happened

British epidemiologist Archie Cochrane argued in the 1970s that medical practice should be based on systematic reviews of evidence, not individual studies or clinical intuition. In 1993, the Cochrane Collaboration was founded to produce rigorous systematic reviews of healthcare interventions. It now maintains over 8,000 reviews covering virtually every area of medicine.

Outcome

Short Term

Initially met with resistance from clinicians who saw it as cookbook medicine. Adoption was slow through the 1990s.

Long Term

Evidence-based medicine became the dominant paradigm. Systematic reviews now guide treatment decisions worldwide, and the infrastructure Cochrane built—registries, protocols, standardized methods—became a model for other fields.

Why It's Relevant Today

Medicine faced a similar credibility challenge decades earlier and built institutional infrastructure to address it. The open science movement in social science is following a parallel path: from identifying the problem, to building tools and standards, to changing institutional incentives. SCORE's findings suggest social science is roughly where medicine was in the early 2000s—past denial, building infrastructure, but not yet at universal adoption.

Sources

(11)

Largest-ever reproducibility test finds half of social science claims don't replicate

Overview

Key Indicators

Related Media

Interactive

Ever wondered what historical figures would say about today's headlines?

Preview Voice

Generating Voice

Choose a Historical Figure

Albert Einstein

Ambrose Bierce

Andrew Carnegie

Andrew Mellon

Ayn Rand

Benjamin Franklin

Cecil Rhodes

Charles Darwin

Cornelius Vanderbilt

Dorothy Parker

Eleanor Roosevelt

Frederick Douglass

G. K. Chesterton

George Orwell

H. L. Mencken

Hannah Arendt

J. P. Morgan

James Baldwin

Jamsetji Tata

Jane Addams

John Locke

Jonathan Swift

Madam C. J. Walker

Mark Twain

Mary Wollstonecraft

Niccolo Machiavelli

Oscar Wilde

Rachel Carson

Samuel Johnson

Simone Weil

Sojourner Truth

Thomas Hobbes

Thomas Jefferson

Thomas Paine

Voltaire

Winston Churchill

Debate Arena

Who Said What?

People Involved

Organizations Involved

Timeline

SCORE publishes largest-ever reproducibility results in Nature

DARPA funds SCORE program with $8 million

Social Sciences Replication Project finds 62% replication rate

Reproducibility Project finds only 36% of psychology studies replicate

Transparency and Openness Promotion guidelines published

Center for Open Science founded

Diederik Stapel fraud scandal breaks

Daryl Bem publishes precognition paper, exposing methodological flaws

Ioannidis publishes 'Why Most Published Research Findings Are False'

Scenarios

Open science mandates become universal, reproducibility climbs above 80%

Policymakers begin auditing social science behind existing regulations

Analytical robustness findings reshape how studies are reported

Reform stalls as incentive structures resist change

Historical Context

Ioannidis paper and the birth of metascience (2005)

What Happened

Outcome

Why It's Relevant Today

Reproducibility Project: Psychology (2015)

What Happened

Outcome

Why It's Relevant Today

Cochrane Collaboration and evidence-based medicine (1993–present)

What Happened

Outcome

Why It's Relevant Today

Sources

Related Stories

Google builds an AI system that generates scientific hypotheses — and some are proving correct