Pull to refresh
Logo
Daily Brief
Following
Why Ranks Sign Up
AI models learn to read, predict, and write the genetic code of life

AI models learn to read, predict, and write the genetic code of life

New Capabilities

From protein folding to whole-genome design, biological foundation models are compressing decades of lab work into computation

March 4th, 2026: Evo 2 published in Nature

Overview

It took thirteen years and $2.7 billion to read the first human genome. Now a single AI model, trained on 9 trillion DNA base pairs from more than 128,000 species, can predict whether an uncharacterized mutation in a breast cancer gene is dangerous with 90 percent accuracy without being trained on that gene.

On March 4, the Arc Institute and NVIDIA published Evo 2 in Nature. It's the largest biological foundation model ever built, with 40 billion parameters, a context window of one million nucleotides, and the ability to design synthetic genomes the size of a simple bacterium. Evo 2 sits at the leading edge of a wave that began when DeepMind's AlphaFold cracked protein structure prediction in 2020, an achievement that won a Nobel Prize.

The next frontier is harder: moving from reading biology to writing it. Evo 2 can generate functional virus genomes that infect bacteria, a capability with direct applications in fighting antibiotic-resistant infections and obvious biosafety implications for misuse.

The model's code, training data, and weights are publicly available, while sequences from human pathogens were deliberately excluded from training as a safety measure. What comes next and who governs it remain open questions that researchers, regulators, and biosecurity experts are racing to answer.

Questions about this story

No questions yet — be the first to ask.

Play on this story Voices Debate Predict

Key Indicators

9.3T
DNA base pairs in training data
Evo 2 was trained on 9.3 trillion nucleotides from 128,000+ species spanning all three domains of life.
40B
Model parameters
The largest version of Evo 2 has 40 billion parameters, making it the biggest AI model built for biology.
90%
BRCA1 variant classification accuracy
Evo 2 predicted whether previously uncharacterized BRCA1 mutations affect gene function with 90 percent accuracy, without any task-specific training.
1M
Nucleotide context window
The model can process up to one million nucleotides at once—eight times more than Evo 1—enabling it to capture long-range dependencies across genomes.
16
Viable AI-designed bacteriophages
Out of roughly 300 AI-generated phage genome designs tested, 16 proved functional, with some outperforming natural phages.

Voices

Curated perspectives — historical figures and your fellow readers.

Ever wondered what historical figures would say about today's headlines?

Sign up to generate historical perspectives on this story.

Play

Exploring all sides of a story is often best achieved with Play.

Log in to play. Track your picks, climb the leaderboards. Log in Sign Up
Predict 4 ways this could play out. Contrarian picks score more — points lock when the scenario resolves. Log in to play
Connections Sixteen names from the news. Find the four hidden groups of four. Log in to play

People Involved

Organizations Involved

Timeline

November 2020 March 2026

9 events Latest: March 4th, 2026 · 4 months ago
Tap a bar to jump to that date
  1. Evo 2 published in Nature

    Latest Publication

    The peer-reviewed Evo 2 paper appeared in Nature, describing the 40-billion-parameter model's ability to predict pathogenic mutations and design synthetic genomes across all domains of life.

  2. AI-generated bacteriophages shown to be functional

    Research

    Researchers used Evo models to generate synthetic bacteriophage genomes. Of roughly 300 designs tested, 16 proved viable, with some outperforming natural phages and a cocktail overcoming bacterial resistance in three E. coli strains.

  3. Evo 2 preprint released with open-source code and data

    Research

    Arc Institute and NVIDIA posted the Evo 2 preprint on bioRxiv, alongside publicly releasing model weights, training code, and the OpenGenome2 dataset of 9.3 trillion nucleotides.

  4. Evo 1 published in Science

    Research

    Arc Institute published Evo 1 in Science: a 7-billion-parameter model trained on prokaryotic genomes that could generate functional CRISPR systems and transposons, marking the first protein-RNA codesign with a language model.

  5. AlphaFold creators win Nobel Prize in Chemistry

    Recognition

    Demis Hassabis and John Jumper of Google DeepMind received the Nobel Prize in Chemistry for AlphaFold's protein structure predictions. David Baker shared the prize for computational protein design.

  6. ProGen demonstrates AI-designed functional proteins

    Research

    Salesforce Research published results in Nature Biotechnology showing that its ProGen model could generate novel protein sequences, with 73 percent of AI-designed proteins proving functional in lab tests—outperforming 59 percent of natural proteins.

  7. Meta releases ESM-2 and ESMFold protein language models

    Research

    Meta AI released ESM-2, a 15-billion-parameter protein language model, alongside ESMFold for structure prediction. The accompanying Metagenomic Atlas predicted structures for over 617 million proteins.

  8. Arc Institute launches with $650 million in funding

    Institutional

    The Arc Institute launched in Palo Alto with a novel funding model: eight-year unrestricted grants for scientists, in partnership with Stanford, UC Berkeley, and UC San Francisco.

  9. AlphaFold 2 cracks protein folding at CASP14

    Breakthrough

    Google DeepMind's AlphaFold 2 predicted protein structures with accuracy matching laboratory experiments at the CASP14 competition, effectively solving a fifty-year-old problem in biology.

Historical Context

3 moments from history that rhyme with this story — and how they unfolded.

1990–April 2003

Human Genome Project (1990–2003)

An international consortium of researchers spent thirteen years and approximately $2.7 billion to sequence the first human genome's 3 billion base pairs. When completed in April 2003, it covered about 92 percent of the genome and was hailed as biology's equivalent of the Moon landing.

Then

Sequencing costs began a dramatic decline—from $50 million for a second genome in 2003 to under $200 by 2024—as next-generation sequencing technology emerged.

Now

The project created the reference genome that underpins all modern genomics, from cancer diagnostics to ancestry testing. But reading the genome turned out to be far easier than understanding it—the function of most genetic variation remains unknown.

Why this matters now

Evo 2 was trained on the genomic data that the Human Genome Project and its successors generated. Its ability to predict mutational effects without task-specific training directly addresses the interpretation gap that has persisted since 2003: we can read genomes cheaply, but understanding what the variations mean has remained the bottleneck.

November 2020

AlphaFold 2 solves protein structure prediction (2020)

Google DeepMind entered AlphaFold 2 in the CASP14 protein structure prediction competition and achieved accuracy comparable to experimental methods, solving a problem that had stymied biologists for fifty years. The team later predicted structures for virtually all 200 million known proteins and made the database freely available.

Then

The structural biology community gained instant access to predicted structures that would have taken years to determine experimentally. More than two million researchers used the database within two years.

Now

Demis Hassabis and John Jumper won the 2024 Nobel Prize in Chemistry. AlphaFold demonstrated that AI could transform biology, catalyzing a wave of biological foundation models—including ESM-2, ProGen, and Evo—that expanded from protein structure to protein design to whole-genome modeling.

Why this matters now

AlphaFold proved the core premise that Evo 2 extends: biological sequence data contains enough information for AI to learn deep functional relationships. AlphaFold worked on proteins; Evo 2 operates on raw DNA across all of life, a larger and more fundamental challenge.

February 1975

Asilomar Conference on Recombinant DNA (1975)

140 biologists, lawyers, and journalists gathered at Asilomar, California, to address safety concerns about recombinant DNA technology—the ability to splice genes from one organism into another. Scientists had voluntarily paused certain experiments and convened the conference to establish safety guidelines before proceeding.

Then

The conference produced a set of safety guidelines that informed the National Institutes of Health's regulations on recombinant DNA research, allowing the work to continue under oversight.

Now

Asilomar became the defining example of scientific self-regulation. Recombinant DNA technology went on to produce insulin, gene therapy, and genetically modified crops. The guidelines evolved but the framework of voluntary caution followed by formal regulation persisted.

Why this matters now

The Evo 2 team's decision to exclude human pathogen sequences from training data echoes Asilomar's approach: scientists voluntarily limiting their own work before regulators act. But the parallel has limits—Asilomar governed a handful of labs, while Evo 2's open-source release means the model is available to anyone with sufficient computing power.

Sources

(14)