Pull to refresh
Logo
Daily Brief
Following
Why Sign Up
Dan Hendrycks

Dan Hendrycks

Director, Center for AI Safety

Appears in 1 story

Stories

Google Gemini's push toward scientific reasoning

New Capabilities

Co-creator of Humanity's Last Exam benchmark

OpenAI launched the first commercial reasoning model in September 2024. Seventeen months later, Google claims its upgraded Gemini 3 Deep Think has pulled ahead on the benchmarks that matter most for science. The February 2026 update scored 84.6% on ARC-AGI-2—a test designed to measure how well artificial intelligence generalizes to novel problems—and 48.4% on Humanity's Last Exam, a collection of 2,500 expert-level questions crowdsourced from nearly 1,000 specialists worldwide.

Updated Feb 13