Google builds an AI system that generates scientific hypotheses — and some are proving correct

Overview

Google released an AI system in February 2025 that doesn't just search scientific literature — it proposes original hypotheses, then refines them through internal debate among six specialized AI agents. In one early test, the system independently arrived at the same mechanism for how antibiotic-resistance genes spread between bacterial species that a team at Imperial College London had spent a decade proving in the lab. It did so in two days, without access to their unpublished findings.

The tool, called the AI Co-Scientist, has since produced drug-repurposing candidates for liver fibrosis and acute myeloid leukemia that passed initial laboratory validation. But the system also raises a question scientists haven't had to answer before: what happens when generating hypotheses — long considered the core intellectual act of research — becomes something a machine can do faster and, in some cases, as accurately as a human expert?

Why it matters

If AI can reliably generate valid scientific hypotheses, the bottleneck in research shifts from ideas to laboratory capacity.

Play on this story Voices Debate Predict

Key Indicators

48 hours

Time to match a decade of research

The AI Co-Scientist independently reproduced Imperial College London's unpublished findings on antibiotic resistance gene transfer in two days

Specialized AI agents in the system

The tool uses generation, reflection, ranking, evolution, proximity, and meta-review agents coordinated by a supervisor

Lab-validated discoveries so far

Drug candidates for liver fibrosis, acute myeloid leukemia, and an antimicrobial resistance mechanism have all been confirmed experimentally

p < 0.01

Statistical significance of drug results

Both AI-suggested drug repurposing candidates for liver fibrosis showed statistically significant anti-fibrotic activity in human organoids

Interactive

Exploring all sides of a story is often best achieved with Play.

Ever wondered what historical figures would say about today's headlines?

People Involved

Demis Hassabis

Leading Google's AI-for-science strategy

Jose Penades

Validated AI Co-Scientist's antimicrobial resistance hypothesis

Organizations Involved

Google DeepMind

Technology platform; AI assistant operator

Developer and operator of the AI Co-Scientist

Alphabet's primary artificial intelligence research division, responsible for AlphaFold, AlphaGo, Gemini, and the AI Co-Scientist.

Isomorphic Labs

AI Drug Discovery Company

Advancing AI-designed drug candidates toward clinical trials

An Alphabet subsidiary applying AI to pharmaceutical research, aiming to bring its first AI-designed drug candidate into clinical trials.

Timeline

AI Co-Scientist validated results published in peer-reviewed journals
Validation

Drug repurposing results generated by the AI Co-Scientist for liver fibrosis are published in Advanced Science, with the FDA-approved cancer drug vorinostat confirmed as showing significant anti-fibrotic activity in human hepatic organoids. The tool moves from demonstration to published, peer-reviewed science.
January 7th, 2026
Microsoft enters AI-for-science race with Discovery platform
Competition

Microsoft announces its own enterprise agentic platform for accelerating scientific research, including tools for materials discovery and protein engineering, broadening the competitive landscape.
May 19th, 2025
Sakana AI releases fully autonomous AI Scientist v2
Competition

Japanese startup Sakana AI releases a competing system that goes further than Google's collaborative approach, generating entire research papers autonomously for roughly $15 each. One such paper later becomes the first fully AI-generated work to pass rigorous human peer review.
April 1st, 2025
Scientists push back on 'co-scientist' framing
Criticism

TechCrunch publishes expert responses questioning whether the tool truly generates novel hypotheses or recombines existing knowledge, and whether automating hypothesis generation diminishes the core intellectual work of science.
March 5th, 2025
Google announces the AI Co-Scientist
Launch

Google introduces the AI Co-Scientist, a multi-agent system built on Gemini 2.0 that generates and refines scientific hypotheses. The announcement includes a Trusted Tester Program for research organizations worldwide and details three validated discoveries.
February 19th, 2025
Imperial College London confirms AI matched decade of research
Validation

Microbiologists Jose Penades and Tiago Costa reveal that the AI Co-Scientist independently reproduced their unpublished findings on how antibiotic-resistance genes spread between bacterial species — a mechanism their team spent ten years proving experimentally.
February 19th, 2025
OpenAI launches Deep Research
Competition

OpenAI releases its own AI research tool, Deep Research, weeks before Google's announcement, intensifying the race to build AI systems that can assist with scientific inquiry.
February 2nd, 2025
Hassabis and Jumper win Nobel Prize for AlphaFold
Recognition

The Nobel Committee awards the Chemistry prize to Demis Hassabis and John Jumper for AlphaFold's contributions to computational protein structure prediction, validating AI-driven scientific research at the highest level.
October 9th, 2024
AlphaFold 2 solves the protein-folding problem
Milestone

Google DeepMind's AlphaFold 2 demonstrates it can predict protein structures with near-experimental accuracy, solving a 50-year grand challenge in biology and establishing AI as a serious tool for scientific discovery.
November 30th, 2020

Scenarios

Predict which scenario wins. Contrarian picks score more — points lock in when the scenario resolves.

Predictions leaderboard

AI hypothesis generation becomes standard lab infrastructure within three years

As more validated results are published and the Trusted Tester Program expands, AI hypothesis generation tools become as routine in research labs as statistical software. Google, OpenAI, and Microsoft compete on accuracy and domain breadth. The bottleneck in science shifts decisively from idea generation to experimental validation capacity, accelerating discovery timelines in drug development and materials science by years. Funding agencies begin requiring AI-assisted literature reviews in grant applications.

Discussed by: Google DeepMind leadership, IEEE Spectrum analysis, optimistic researchers in the Trusted Tester Program

Consensus —

Validation failures and reproducibility concerns slow adoption

As independent labs attempt to validate AI-generated hypotheses outside Google's curated collaborations, a significant fraction turn out to be sophisticated-sounding recombinations of existing knowledge that don't hold up experimentally. The hallucination problem proves harder to solve in scientific contexts than in general language tasks. Adoption stalls as researchers lose trust in AI-generated outputs, and the tool becomes useful primarily for literature synthesis rather than novel discovery.

Discussed by: TechCrunch expert panel, Sony Computer Science Laboratories researchers, academic skeptics

Consensus —

AI-designed drugs enter clinical trials, proving the full pipeline works

Isomorphic Labs advances its first AI-designed drug candidate into clinical trials by late 2026, while AI Co-Scientist-generated hypotheses lead to at least one additional drug entering preclinical development. This would demonstrate that AI can contribute meaningfully not just to hypothesis generation but to the entire research-to-treatment pipeline, potentially compressing drug development timelines from the current average of 10-15 years.

Discussed by: Isomorphic Labs, pharmaceutical industry analysts, DeepMind's published roadmap

Consensus —

Hypothesis flood overwhelms the scientific validation system

AI tools generate hypotheses orders of magnitude faster than labs can test them, creating a massive backlog of plausible but unverified ideas. Peer review systems, already strained, buckle under a wave of AI-assisted papers. The scientific community is forced to develop new triage mechanisms — potentially AI-powered themselves — to decide which hypotheses merit scarce experimental resources, fundamentally restructuring how research priorities are set.

Discussed by: European Journal of Cardiovascular Nursing editorial, research methodology commentators

Consensus —

Historical Context

AlphaFold solves protein folding (2020)

November 2020

What Happened

Google DeepMind's AlphaFold 2 demonstrated it could predict protein 3D structures with near-experimental accuracy at the Critical Assessment of protein Structure Prediction (CASP14) competition. The protein-folding problem — predicting a protein's shape from its amino acid sequence — had been an open grand challenge in biology for 50 years. By 2022, DeepMind had published predicted structures for nearly every known protein, roughly 200 million structures.

Outcome

Short Term

Structural biologists gained instant access to protein structures that would have taken years to determine experimentally. Drug designers could model molecular interactions without waiting for lab results.

Long Term

AlphaFold established AI as a legitimate tool for fundamental scientific discovery, not just data analysis. Hassabis and Jumper received the 2024 Nobel Prize in Chemistry, and the success became the template for Google's broader AI-for-science ambitions.

Why It's Relevant Today

The AI Co-Scientist is a direct extension of the approach that worked with AlphaFold — applying AI to a well-defined scientific problem — but generalized from protein structure to hypothesis generation across all domains. AlphaFold's success gave Google both the credibility and the organizational confidence to attempt this much broader challenge.

Meta's Galactica launch and withdrawal (2022)

November 2022

What Happened

Meta's AI research division released Galactica, a large language model trained on over 48 million scientific papers, textbooks, and datasets. It was designed to summarize literature, solve math problems, and generate scientific text. Within three days of public release, researchers demonstrated it confidently generated racist content and scientifically inaccurate text presented as fact. Meta pulled the public demo.

Outcome

Short Term

The withdrawal embarrassed Meta and fueled skepticism about applying large language models to scientific research. Critics argued that plausible-sounding but wrong scientific text was more dangerous than obviously wrong text.

Long Term

The incident established a cautionary template for AI-in-science tools: the ability to generate fluent scientific prose doesn't mean the content is correct. It pushed subsequent efforts, including Google's, toward architectures with built-in verification and self-critique rather than simple text generation.

Why It's Relevant Today

Google's multi-agent design — with dedicated reflection, ranking, and meta-review agents that critique and challenge the generation agent's output — directly addresses the failure mode that sank Galactica. The AI Co-Scientist's architecture is partly an answer to the question: how do you prevent an AI from being confidently wrong about science?

IBM Watson for Oncology disappointment (2013-2018)

2013-2018

What Happened

IBM marketed Watson as an AI system that could recommend cancer treatments by analyzing patient records and medical literature. Over five years, hospitals in the United States, India, South Korea, and elsewhere deployed it. Internal IBM documents later revealed the system frequently made unsafe and incorrect treatment recommendations, and its training relied heavily on a small number of doctors at Memorial Sloan Kettering rather than broad medical evidence.

Outcome

Short Term

Several hospitals abandoned Watson for Oncology. IBM's healthcare AI division lost credibility and was eventually sold to Francisco Partners in 2022 for roughly $1 billion — a fraction of the estimated $4 billion IBM had invested.

Long Term

Watson became shorthand for the gap between AI marketing and AI reality in healthcare. It raised lasting questions about validation standards for AI systems making scientific or medical recommendations.

Why It's Relevant Today

The AI Co-Scientist faces the same fundamental challenge Watson did: proving that AI-generated scientific recommendations are reliable enough to act on. Google's strategy of publishing validated results in peer-reviewed journals and running a Trusted Tester Program suggests it learned from Watson's failure to establish credibility through rigorous, independent validation rather than marketing claims.