Pull to refresh
Logo
Daily Brief
Following
Why
AI decodes the genome's dark matter

AI decodes the genome's dark matter

New Capabilities
By Newzino Staff |

DeepMind's Open-Source AlphaGenome Gives Scientists New Tools to Understand Non-Coding DNA

January 31st, 2026: Adoption Reaches 3,000 Scientists Globally

Overview

For twenty years after scientists sequenced the human genome, 98% of it remained essentially unreadable. The protein-coding genes were mapped, but the vast regulatory regions—the genome's operating system—stayed opaque. On January 28, 2026, Google DeepMind released the full source code for AlphaGenome, an artificial intelligence model that predicts how genetic variants in these non-coding regions affect gene regulation and disease.

Nearly 3,000 scientists in 160 countries have already used the model since its launch seven months ago, applying it to cancer, neurodegenerative disorders, and rare genetic conditions. The open-source release—covering code, model weights, and documentation—means any research institution can now run AlphaGenome locally on a single graphics processing unit, rather than accessing it only through DeepMind's servers. For the estimated 350 million people with undiagnosed rare conditions, this marks a significant expansion in the tools available to find answers hidden in their DNA.

Key Indicators

98%
Non-coding genome
The portion of human DNA that doesn't encode proteins, where most genetic variation occurs and where AlphaGenome focuses its predictions.
~3,000
Scientists using AlphaGenome
Researchers across 160 countries who have adopted the model since its June 2025 launch.
1 million
Base-pairs per input
The length of DNA sequence AlphaGenome can process at once—five times longer than its predecessor Enformer.
25 of 26
Benchmarks exceeded
Evaluations of variant effect prediction where AlphaGenome matched or outperformed the best existing models.

Interactive

Exploring all sides of a story is often best achieved with Play.

Ever wondered what historical figures would say about today's headlines?

Sign up to generate historical perspectives on this story.

Sign Up

Debate Arena

Two rounds, two personas, one winner. You set the crossfire.

People Involved

Žiga Avsec
Žiga Avsec
Research Scientist, Google DeepMind; Lead Author of AlphaGenome (Leading DeepMind's genomics initiative)
Demis Hassabis
Demis Hassabis
Chief Executive Officer and Co-Founder, Google DeepMind (2024 Nobel Prize Laureate in Chemistry)
Eric Klee
Eric Klee
Researcher, Mayo Clinic (Testing AlphaGenome for rare disease diagnosis)

Organizations Involved

Google DeepMind
Google DeepMind
AI Research Laboratory
Status: Released AlphaGenome source code for non-commercial use

Google's AI research division, responsible for AlphaFold, AlphaMissense, and AlphaGenome.

Wilhelm Foundation
Wilhelm Foundation
Non-Profit Organization
Status: Organizing global Undiagnosed Hackathons

Swedish foundation that organizes Undiagnosed Hackathons to diagnose rare genetic conditions.

Timeline

  1. Adoption Reaches 3,000 Scientists Globally

    Milestone

    Nearly 3,000 scientists across 160 countries are now using AlphaGenome for research into cancer, neurodegeneration, and rare diseases.

  2. AlphaGenome Paper Published and Code Released

    Open Source Release

    Nature publishes AlphaGenome paper. DeepMind releases full source code and model weights under Apache 2.0 license for non-commercial use, enabling local deployment on a single GPU.

  3. Mayo Clinic Hosts Undiagnosed Hackathon

    Application

    First U.S. Undiagnosed Hackathon brings 130 researchers from 28 countries. Using AlphaGenome and other tools, teams diagnose 6 cases in 48 hours, with 3 more diagnosed in follow-up.

  4. AlphaGenome Preprint and API Launch

    AI Development

    DeepMind releases AlphaGenome preprint and provides API access for non-commercial research. Scientists can access predictions through DeepMind's servers.

  5. Nobel Prize for AlphaFold

    Recognition

    Demis Hassabis and John Jumper awarded Nobel Prize in Chemistry for protein structure prediction, sharing the prize with David Baker.

  6. AlphaMissense Catalogues Protein-Coding Variants

    AI Release

    DeepMind releases AlphaMissense, classifying 89% of 71 million possible missense variants as likely pathogenic or benign. Focuses on the 2% of the genome that codes for proteins.

  7. Enformer Model Published

    AI Development

    DeepMind publishes Enformer in Nature Methods, demonstrating that deep learning can predict gene expression from DNA sequence up to 196,000 base-pairs.

  8. DeepMind Open-Sources AlphaFold

    Open Source Release

    DeepMind releases AlphaFold code and launches protein structure database with EMBL-EBI, eventually reaching 2 million users.

  9. AlphaFold Solves Protein Folding Problem

    AI Breakthrough

    DeepMind's AlphaFold2 wins CASP14 with near-experimental accuracy, solving a 50-year grand challenge in biology.

  10. ENCODE Project Publishes Major Findings

    Scientific Milestone

    Encyclopedia of DNA Elements project assigns biochemical functions to 80% of the genome, revealing millions of regulatory elements outside protein-coding regions.

  11. Human Genome Project Completed

    Scientific Milestone

    International consortium announces completion of human genome sequence, covering 99% of gene-containing regions. The $2.7 billion project finished two years ahead of schedule.

Scenarios

1

AlphaGenome Becomes Standard Tool for Rare Disease Diagnosis

Discussed by: Nature News, Mayo Clinic researchers

Genetic testing laboratories integrate AlphaGenome into their diagnostic pipelines, routinely analyzing non-coding variants that were previously classified as 'variants of uncertain significance.' The diagnostic yield for rare genetic conditions—currently around 30-40% using standard methods—increases substantially. Major clinical sequencing providers add AlphaGenome-based annotations to their reports.

2

Pharmaceutical Companies Adopt AlphaGenome for Drug Target Discovery

Discussed by: Drug Discovery and Development, World Economic Forum

Drug developers use AlphaGenome to identify regulatory variants that affect disease-relevant genes, revealing new drug targets in conditions where protein-coding mutations are rare. Combined with AlphaFold for protein structure and AlphaMissense for coding variants, this creates an end-to-end AI pipeline from genetic association to druggable target. The AI drug discovery sector, which drew $3.3 billion in 2024, expands into regulatory genomics.

3

Limitations Emerge in Clinical Translation

Discussed by: DeepMind researchers (acknowledged limitations), Scientific American

Despite strong benchmark performance, clinical implementation reveals gaps. The model struggles with very distant regulatory elements beyond 100,000 base-pairs, and its predictions don't capture the full complexity of tissue-specific and developmental effects. Regulatory agencies require extensive validation before allowing AlphaGenome predictions to influence clinical decisions, slowing adoption in diagnostic settings.

4

Competing Models Challenge DeepMind's Lead

Discussed by: Chemistry World, Lifebit analysis

Other organizations develop competing genomics AI models, either building on the open-sourced AlphaGenome code or pursuing alternative architectures. Academic consortia or well-funded biotechs could produce specialized models optimized for specific disease areas or populations underrepresented in training data. The field fragments into multiple tools rather than converging on a single standard.

Historical Context

Human Genome Project Completion (2003)

October 1990 – April 2003

What Happened

The $2.7 billion international effort sequenced 99% of human gene-containing regions, completing two years ahead of schedule. The project established that humans have approximately 20,000 protein-coding genes—far fewer than expected—accounting for just 1.5% of the genome. Scientists hoped the sequence would rapidly unlock the genetic basis of disease.

Outcome

Short Term

The sequence enabled identification of genes linked to cystic fibrosis, breast cancer, and thousands of other conditions. Genome-wide association studies became the dominant research paradigm.

Long Term

Despite mapping disease-associated variants, most fell in non-coding regions where their effects remained mysterious. The '98% problem' became biology's next grand challenge.

Why It's Relevant Today

AlphaGenome directly addresses the unfulfilled promise of the Human Genome Project—understanding what the non-coding 98% actually does and how it contributes to disease.

AlphaFold Open-Source Release (2021)

July 2021

What Happened

After AlphaFold2 solved the protein structure prediction problem at CASP14, DeepMind open-sourced the code and partnered with EMBL-EBI to create a freely accessible database of 200 million predicted protein structures. The move represented a departure from traditional commercial AI development.

Outcome

Short Term

Over 2 million scientists accessed the database within two years. Research accelerated on enzyme design, drug discovery, and understanding disease mechanisms.

Long Term

Hassabis and Jumper won the 2024 Nobel Prize in Chemistry. The open-source model became the template for DeepMind's scientific AI releases, including AlphaMissense and AlphaGenome.

Why It's Relevant Today

The AlphaGenome release follows the same playbook: publish in Nature, then open-source for non-commercial use. DeepMind is betting that democratizing access accelerates scientific progress.

ENCODE Project Findings (2012)

September 2012

What Happened

The Encyclopedia of DNA Elements project assigned biochemical functions to 80% of the genome, identifying nearly 3 million regulatory sites. The findings challenged the 'junk DNA' concept but also sparked controversy about whether biochemical activity equals biological function.

Outcome

Short Term

Researchers gained a map of potential regulatory elements but lacked tools to predict how specific variants in these regions affected gene expression or disease risk.

Long Term

ENCODE established that non-coding regions contain essential regulatory machinery. The project continues through phase 4, generating data that trained models like AlphaGenome.

Why It's Relevant Today

ENCODE catalogued where regulatory elements exist; AlphaGenome predicts what happens when mutations occur in them. The two represent complementary approaches to decoding the non-coding genome.

10 Sources: