For the first time, AI systems are independently solving mathematical problems that stumped human researchers for decades. Since Christmas 2025, 15 problems from the legendary mathematician Paul Erdős's collection have been moved from 'open' to 'solved'—and 11 of those solutions specifically credited AI models. On January 6, 2026, a combination of OpenAI's GPT-5.2 Pro and Harmonic's Aristotle theorem prover produced the first fully autonomous AI solution to an Erdős problem that hadn't already been solved in the existing literature.
The achievement marks a shift in what automated systems can contribute to mathematical research. These aren't competition problems with known solutions—they're genuine open questions that working mathematicians had failed to crack. Yet experts urge caution: Fields Medalist Terence Tao, who personally verified several of the AI-generated proofs, emphasizes these represent the 'lowest hanging fruit' from Erdős's collection—problems solvable with known techniques that simply hadn't received enough attention. The harder problems, requiring genuinely novel insights, remain out of reach.
Click a figure to generate their perspective on this story
Debate Arena
Two rounds, two personas, one winner. You set the crossfire.
Choose Your Battle
Watch two AI personas debate this story using real evidence
Make predictions and set the crossfire to earn XP and cred
Select Your Champions
Choose one persona for each side of the debate
DEBATE TOPIC
SIDE A (PRO)
Select debater for this side:
✓
SIDE B (CON)
Select debater for this side:
✓
Choose personas with different perspectives for a more dynamic debate
VS
Get ready to make your prediction...
Round of
Claim
Evidence
Stakes
Crossfire Answer
Closing Statement
Claim
Evidence
Stakes
Crossfire Answer
Closing Statement
Your Crossfire Question
Generating arguments...
Who's Got This Round?
Make your prediction before the referee scores
Correct predictions earn +20 XP
Evidence
40%
Logic
30%
Detail
20%
Style
10%
Round Results
Your Pick!
+20 XP
Your Pick
Not this time
Evidence (40%)
Logic (30%)
Detail (20%)
Style (10%)
Overall Score
/10
Your Pick!
+20 XP
Your Pick
Not this time
Evidence (40%)
Logic (30%)
Detail (20%)
Style (10%)
Overall Score
/10
Set the Crossfire
Pick the question both personas must answer in the final round
Crafting crossfire questions...
Choosing a question earns +10 XP crossfire bonus
🏆
Total XP Earned
Cred Change
Predictions
Debate Oracle! You called every round!
Sharp Instincts! You know your debaters!
The Coin Flip Strategist! Perfectly balanced!
The Contrarian! Bold predictions!
Inverse Genius! Try betting the opposite next time!
XP Breakdown
Base completion+20 XP
Rounds played ( rounds x 5 XP)
+ XP
Correct predictions ( correct x 20 XP)
+ XP
Crossfire bonus+10 XP
Accuracy
%
Prediction History
Round
You picked:
✓✗
Keep debating to level up your credibility and unlock achievements
Who Said What?
WHO SAID WHAT?
Can you match the quotes to the right people?
Rounds
People
Score:
Round /
streak
-- ?
Score:
Round /
streak
Next Up
Round
of
points
Correct
Best Streak
Time Bonus
People Involved
Terence Tao
Fields Medalist, UCLA Mathematics Professor (Actively coordinating AI-assisted mathematics research through erdosproblems.com)
Kevin Barreto
Cambridge University Mathematics Student (Pioneering amateur use of AI theorem provers)
Paul Erdős
Hungarian Mathematician (1913-1996) (Deceased; legacy problems continue to drive research)
Organizations Involved
HA
Harmonic
AI Research Company
Status: Developer of Aristotle theorem prover, central to recent breakthroughs
AI startup developing Aristotle, a theorem prover that achieved gold-medal performance at the 2025 International Mathematical Olympiad with formally verified proofs in Lean.
GO
Google DeepMind
AI Research Laboratory
Status: Developed AlphaProof and AlphaEvolve systems contributing to mathematical discovery
Google's primary AI research division, which achieved silver-medal performance at IMO 2024 with AlphaProof and developed AlphaEvolve for mathematical exploration.
OP
OpenAI
Artificial Intelligence Company (Public Benefit Corporation)
Status: GPT-5.2 Pro central to multiple Erdős problem solutions
Developer of the GPT model series; GPT-5.2 Pro released December 2025 has demonstrated strong mathematical reasoning capabilities.
Timeline
AI Solutions Documented Across Multiple Problem Categories
Documentation
Terence Tao updates the erdosproblems.com wiki with comprehensive tracking of AI contributions across 10 categories, including solutions, partial results, literature reviews, and formalizations.
GPT-5.2 Pro Solves Problems #729 and #205
Solution
GPT-5.2 Pro combined with Aristotle produces full Lean-verified solutions to Erdős Problems #729 and #205, extending the autonomous solution approach.
AlphaProof Proves Variant of Problem #477
Solution
DeepMind's AlphaProof produces a Lean-verified proof of a variant of Erdős Problem #477.
First Fully Autonomous AI Solution to Open Erdős Problem
Breakthrough
Kevin Barreto uses GPT-5.2 Pro and Aristotle to solve Erdős Problem #728 autonomously. Unlike previous solutions, no prior human proof exists in the literature. Terence Tao verifies the result.
Claude Opus 4.5 Upgrades Partial Result to Full Solution
Solution
Anthropic's Claude Opus 4.5 and Gemini 3 Pro upgrade an existing partial result on Erdős Problem #871 to a full solution with Lean verification.
Pace of AI-Assisted Solutions Accelerates
Milestone
Beginning Christmas Day, AI-assisted solutions to Erdős problems accelerate dramatically, with 15 problems moving to solved status over the following three weeks.
OpenAI Releases GPT-5.2
Technical
OpenAI releases GPT-5.2 family including Pro variant with enhanced mathematical reasoning, which will prove central to subsequent Erdős breakthroughs.
Aristotle Autonomously Solves Problem #1026
Solution
Boris Alexeev uses Aristotle to autonomously solve Erdős Problem #1026 in Lean. The next day, a prior human solution from 2016 is discovered in the literature.
Aristotle Achieves First Erdős Partial Result
Breakthrough
Harmonic's Aristotle produces a partial solution to Erdős Problem #124, inspiring amateur mathematicians to systematically apply AI to the problem collection.
First Human-AI Collaborative Erdős Solution
Breakthrough
Boris Alexeev, Wouter van Doorn, and Terence Tao collaborate with Gemini DeepThink and Aristotle to achieve partial solution to Erdős Problem #367.
AlphaEvolve Applied to Erdős Problems
Research
DeepMind's AlphaEvolve system is tested on multiple Erdős problems, achieving slight improvements on some constructions but not matching past results on others.
AI startup Harmonic releases paper on Aristotle, a theorem prover achieving gold-medal equivalent performance on 2025 IMO problems with Lean-verified proofs.
DeepMind's AlphaProof Achieves IMO Silver Medal Standard
Milestone
Google DeepMind announces that AlphaProof and AlphaGeometry solved 4 of 6 International Mathematical Olympiad problems, the first AI system to reach silver-medal level with formally verified proofs.
Paul Erdős Dies, Leaving Hundreds of Open Problems
Background
Hungarian mathematician Paul Erdős dies at 83, leaving a legacy of over 1,500 papers and hundreds of unsolved problems with cash bounties attached.
Scenarios
1
AI Solves Major Open Problem, Transforms Mathematical Research
Discussed by: MIT Technology Review, Nature commentary; optimistic AI researchers
Within 12-24 months, AI systems solve one of the higher-bounty Erdős problems or another famous conjecture, demonstrating ability to generate genuinely novel mathematical insights. This would trigger massive investment in AI-for-math tools and fundamentally change how research mathematics is conducted, with AI becoming standard collaborator on major papers.
AI systems continue solving problems from the accessible tail of Erdős's collection—those solvable with existing techniques that simply hadn't received attention—but make no progress on the core difficult problems requiring novel mathematical insights. The gap between solving competition problems and research problems proves larger than anticipated. AI becomes useful for routine tasks but doesn't transform discovery.
3
Verification Crisis Undermines Trust
Discussed by: Mathematics community forums, Lean proof assistant developers
A high-profile AI solution is later discovered to contain subtle errors that formal verification missed due to misformalization of the problem statement, or reliance on unstated axioms. The incident causes mathematicians to question all AI-generated results and require exhaustive human review, significantly slowing the pace of AI-assisted mathematics.
4
Human-AI Collaboration Becomes Standard Practice
Discussed by: Erdős problems community, AI for Math Initiative partners
Rather than autonomous solving, the dominant mode becomes tight human-AI collaboration: AI handles literature review, numerical exploration, and proof formalization while humans provide problem selection, creative insights, and strategic direction. This hybrid approach produces more results than either alone, and becomes the norm for mathematical research within five years.
Historical Context
Four Color Theorem Computer Proof (1976)
June 1976
What Happened
Kenneth Appel and Wolfgang Haken at the University of Illinois announced they had proved the four color theorem—that any map can be colored with just four colors so no adjacent regions share a color. Their proof required over 1,000 hours of computer time to verify 1,936 special cases. It was the first major mathematical theorem proved with substantial computer assistance.
Outcome
Short Term
The mathematical community reacted with 'equal parts celebration and dismay.' Many mathematicians refused to accept a proof humans couldn't verify by hand. Colleague William Tutte celebrated that they 'smote the kraken' while others despaired at computers 'encroaching on human ingenuity.'
Long Term
The proof gained grudging acceptance and established computer-assisted proof as legitimate, if controversial. It foreshadowed today's debates about AI-generated proofs and what constitutes mathematical understanding versus mere verification.
Why It's Relevant Today
The 1976 controversy over computer proofs mirrors today's debates about AI theorem provers. The key difference: modern formal verification in Lean provides stronger guarantees than 1976's case-checking, but questions about mathematical insight versus brute-force verification persist.
AlphaGo Defeats Lee Sedol (2016)
March 2016
What Happened
DeepMind's AlphaGo defeated world Go champion Lee Sedol 4-1 in a match watched by 200 million people. Game 2's 'Move 37'—a play that seemed wrong to expert commentators but proved decisive—demonstrated AI could find strategies humans had never considered in a game with more possible positions than atoms in the universe.
Outcome
Short Term
Lee Sedol described feeling 'speechless' and the match triggered intense global interest in AI capabilities. Top Go players began studying AI moves to improve their own play.
Long Term
AlphaGo's success demonstrated reinforcement learning could master complex domains, directly inspiring the AlphaProof architecture now being used for theorem proving. The paradigm of AI finding solutions humans couldn't see is now being tested in mathematics.
Why It's Relevant Today
Just as AlphaGo found moves that looked wrong but proved correct, AI theorem provers are now finding proof paths mathematicians hadn't explored. The question is whether mathematical insight can emerge from pattern-matching at scale, as game-playing strategy did.
Kepler Conjecture Formal Verification (2014)
August 2014
What Happened
Thomas Hales announced completion of the Flyspeck project, a 12-year effort to formally verify his 1998 computer-assisted proof of the Kepler conjecture (about optimal sphere packing). The original proof had been accepted only with 99% confidence because referees couldn't verify all computational components; Flyspeck provided complete formal verification in Isabelle and HOL Light proof assistants.
Outcome
Short Term
The verification resolved lingering doubts about the proof's correctness and demonstrated that major mathematical results could be machine-verified end-to-end.
Long Term
Flyspeck pioneered the formal verification methodology now used by Aristotle and AlphaProof. It established that Lean-style proof assistants could provide ironclad guarantees for complex mathematical arguments.
Why It's Relevant Today
The Kepler verification took 12 years of human effort. Today's AI systems can formalize proofs in hours or days. This acceleration is what makes the current moment transformative—not just that AI can find proofs, but that it can verify them at scale.