Pull to refresh
Logo
Daily Brief
Following
Why Ranks Sign Up
Frontier AI labs move into application security, shaking up a $14 billion industry

Frontier AI labs move into application security, shaking up a $14 billion industry

New Capabilities

OpenAI, Anthropic, and Google are deploying autonomous agents that find and fix software vulnerabilities — work previously done by specialized security firms

March 6th, 2026: OpenAI launches Codex Security in research preview, formerly codenamed Aardvark

Overview

For decades, finding security flaws in software has required expensive human experts or pattern-matching tools that miss complex bugs. In five months, all three frontier artificial intelligence labs (OpenAI, Anthropic, and Google) released autonomous agents that read code like a human researcher, discover vulnerabilities traditional scanners miss, and generate patches. On March 6, 2026, OpenAI launched Codex Security in research preview, an agent that scanned 1.2 million code commits in its first month of beta testing and discovered 14 previously unknown vulnerabilities serious enough to receive formal identifiers in OpenSSH, Chromium, and PHP.

The competitive rush into application security, a market projected to reach $35 billion by 2031, has reshaped investor expectations. When Anthropic launched its Claude Code Security tool on February 20, cybersecurity stocks lost tens of billions in market value within days, with CrowdStrike shedding $20 billion.

Traditional security vendors like Snyk and Veracode have responded with layoffs and their own AI tools. The deeper shift is structural: AI systems are now finding real zero-day vulnerabilities in critical open-source infrastructure that human reviewers missed for decades.

Questions about this story

No questions yet — be the first to ask.

Play on this story Voices Debate Predict

Key Indicators

14
Formally cataloged vulnerabilities found by Codex Security
Codex Security discovered and helped report 14 Common Vulnerabilities and Exposures (CVEs) across major open-source projects during beta testing.
500+
Vulnerabilities found by Anthropic's Claude Code Security
Anthropic reported finding over 500 vulnerabilities in production open-source codebases, including bugs that persisted for decades.
50%+
Reduction in false positives during Codex Security beta
False positive rates dropped by over 50% and over-reported severity findings fell by more than 90% compared to initial rollout.
$20B
CrowdStrike market cap loss after Claude Code Security launch
CrowdStrike shares fell 18% in the days following Anthropic's February 20 announcement, erasing roughly $20 billion in market value.

Voices

Curated perspectives — historical figures and your fellow readers.

Ever wondered what historical figures would say about today's headlines?

Sign up to generate historical perspectives on this story.

Play

Exploring all sides of a story is often best achieved with Play.

Log in to play. Track your picks, climb the leaderboards. Log in Sign Up
Predict 4 ways this could play out. Contrarian picks score more — points lock when the scenario resolves. Log in to play
Timeline Five events from this story — drag them oldest to newest. Log in to play
Connections Sixteen names from the news. Find the four hidden groups of four. Log in to play

People Involved

Organizations Involved

OpenAI
OpenAI
Artificial intelligence company
Launched Codex Security as expansion of Codex coding platform into application security

OpenAI developed its security agent internally as Aardvark starting in mid-2025, powered by GPT-5, before rebranding it as Codex Security for its March 2026 public release.

Anthropic
Anthropic
AI developer
Released Claude Code Security in limited research preview, triggering cybersecurity stock sell-off

Anthropic launched Claude Code Security on February 20, 2026, reporting that Claude Opus 4.6 found over 500 vulnerabilities in production open-source codebases that had gone undetected for decades.

Google DeepMind
Google DeepMind
AI research division
Operating Big Sleep vulnerability agent; reported 20 zero-days by August 2025

Google's collaboration between Project Zero and DeepMind produced Big Sleep, the first AI agent to find and block an actively exploited zero-day vulnerability in real-world software.

AISLE
AISLE
AI Security Startup
Discovered all 12 OpenSSL zero-days announced in January 2026; over 100 CVEs across 30+ projects

AISLE operates an AI-native cyber reasoning system that discovered all 12 zero-day vulnerabilities disclosed in OpenSSL's January 2026 security release, including a high-severity remote code execution flaw.

XBOW
XBOW
AI Security Startup
Top-ranked autonomous hacker on HackerOne; raised $117 million total

XBOW's fully autonomous platform reached the number-one hacker ranking on HackerOne in the United States by mid-2025, submitting nearly 1,060 vulnerabilities and outperforming every human participant.

Defense Advanced Research Projects Agency (DARPA)
Defense Advanced Research Projects Agency (DARPA)
United States Federal Agency
Concluded two-year AI Cyber Challenge; open-sourced finalist systems

DARPA's AI Cyber Challenge (AIxCC) ran from 2023 to 2025, proving that autonomous systems could find and patch real vulnerabilities in critical open-source software at scale.

Timeline

October 2024 March 2026

8 events Latest: March 6th, 2026 · 3 months ago
Tap a bar to jump to that date
  1. OpenAI launches Codex Security in research preview, formerly codenamed Aardvark

    Latest Product Launch

    OpenAI released Codex Security to ChatGPT Enterprise, Business, and Edu customers, with free usage for the first month. The agent scanned over 1.2 million commits during beta, identified 792 critical and 10,561 high-severity findings, and discovered 14 CVEs in projects including OpenSSH, GnuTLS, and Chromium. False positive rates dropped by over 50% compared to initial rollout.

  2. Anthropic launches Claude Code Security; cybersecurity stocks plunge

    Product Launch / Market Event

    Anthropic released Claude Code Security in limited research preview, reporting that Claude Opus 4.6 found over 500 vulnerabilities in production open-source code. The announcement triggered a sharp sell-off across cybersecurity stocks, with CrowdStrike losing roughly $20 billion in market value over several days.

  3. AISLE's AI system responsible for all 12 OpenSSL zero-days in security release

    Vulnerability Disclosure

    The OpenSSL project disclosed 12 new zero-day vulnerabilities, and AISLE confirmed its AI cyber reasoning system had discovered all 12, including a high-severity remote code execution flaw. Some bugs had persisted undetected in OpenSSL's heavily audited codebase for decades.

  4. OpenAI announces Aardvark, a GPT-5-powered autonomous security agent

    Product Launch

    OpenAI unveiled Aardvark in private beta, an autonomous agent that uses large language model reasoning rather than traditional static analysis to find, validate, and patch vulnerabilities. The system had already discovered 10 vulnerabilities that received formal CVE identifiers.

  5. DARPA AIxCC finals show autonomous systems can find 86% of vulnerabilities

    Competition

    At DEF CON 33, DARPA's AI Cyber Challenge finalists demonstrated dramatic improvement over the 2024 semifinals, identifying 86% of synthetic vulnerabilities and patching 68%. Team Atlanta won the $4 million grand prize. All finalist systems were open-sourced.

  6. Google reports Big Sleep has found 20 zero-days in open-source projects

    Technical Milestone

    Google disclosed that Big Sleep had discovered 20 previously unknown security vulnerabilities in widely used open-source software including FFmpeg and ImageMagick, with each vulnerability found and reproduced without human intervention.

  7. XBOW raises $75 million after reaching top hacker ranking on HackerOne

    Funding / Milestone

    XBOW's fully autonomous pentesting platform reached the number-one hacker ranking on HackerOne in the United States, outperforming all human participants. The startup raised a $75 million Series B led by Altimeter, bringing total funding to $117 million.

  8. Google's Big Sleep finds first AI-discovered zero-day in real-world software

    Technical Milestone

    A collaboration between Google Project Zero and Google DeepMind, Big Sleep discovered a stack buffer underflow in SQLite before it reached an official release — the first confirmed case of an AI agent finding an exploitable memory-safety flaw in widely used production software.

Historical Context

3 moments from history that rhyme with this story — and how they unfolded.

July 2014

Google Project Zero's founding and the professionalization of vulnerability research (2014)

Google launched Project Zero, a dedicated team of elite security researchers tasked with finding zero-day vulnerabilities in any software, not just Google's. The team, led by Chris Evans, included researchers like Tavis Ormandy and Ben Hawkes who discovered critical flaws in Windows, iOS, and Flash. Their policy of disclosing vulnerabilities after 90 days — whether or not vendors had patched them — forced the industry to take response times seriously.

Then

Major vendors including Microsoft and Apple accelerated their patching cycles. Vendors who missed the 90-day deadline faced public disclosure, creating strong incentives to fix vulnerabilities faster.

Now

Project Zero established the model of a well-funded, independent team finding vulnerabilities at scale — the exact model that AI agents are now automating. The 90-day disclosure norm became an industry standard.

Why this matters now

AI security agents are essentially automating Project Zero's workflow: reading code, understanding behavior, discovering flaws, and proposing fixes. The transition from a team of roughly a dozen elite researchers to autonomous agents that can scan millions of commits represents a step change in scale, not a change in approach.

April 2014

Heartbleed and the crisis of open-source security (2014)

Security researchers discovered Heartbleed (CVE-2014-0160), a critical vulnerability in OpenSSL that allowed attackers to read sensitive memory from any server using the affected library. The bug had existed undetected for over two years in one of the internet's most fundamental cryptographic libraries, despite the code being publicly available for review. An estimated 17% of the internet's secure web servers were vulnerable.

Then

The discovery triggered a global patching emergency. Major websites including Yahoo, the Canada Revenue Agency, and Mumsnet confirmed data breaches linked to the flaw.

Now

Heartbleed led to the creation of the Core Infrastructure Initiative (later the Open Source Security Foundation) and renewed industry focus on funding open-source security. Despite this, AISLE's AI system found 12 new zero-days in OpenSSL in January 2026 — demonstrating that even heavily audited critical infrastructure retains deep, hard-to-find vulnerabilities.

Why this matters now

AISLE's discovery of 12 OpenSSL zero-days in 2026 — some present for decades — directly echoes Heartbleed's lesson: human code review, even by experts, has fundamental limits. AI agents may represent the first approach that can match the scale of modern codebases.

August 2016

DARPA Cyber Grand Challenge and early autonomous security (2016)

DARPA held the Cyber Grand Challenge at DEF CON 24, pitting seven autonomous systems against each other in a capture-the-flag competition to find, exploit, and patch software vulnerabilities in real time — with no human intervention. The winning system, Mayhem, built by ForAllSecure, competed against human teams the following day and finished in the bottom third, demonstrating both the promise and limitations of 2016-era automation.

Then

ForAllSecure commercialized Mayhem as an enterprise security product. The competition demonstrated that autonomous vulnerability discovery was technically feasible but not yet competitive with skilled humans.

Now

DARPA's follow-up AIxCC competition in 2024-2025 showed dramatic improvement: finalists identified 86% of vulnerabilities compared to Mayhem-era systems. The 2016 competition planted the seed; the 2025 results confirmed that AI-powered security had reached practical effectiveness.

Why this matters now

The decade between the 2016 Cyber Grand Challenge and 2026's commercial AI security agents marks the gap between proof-of-concept and market disruption. The same trajectory — from research competition to commercial product — is now playing out at much higher speed with frontier language models.

Sources

(16)