For two years, AI companies have trained their models on web content while paying for almost none of it. That era may be ending. Cloudflare's acquisition of Human Native, announced January 15, 2026, adds a London-based data marketplace to an infrastructure stack that already includes bot-blocking tools, pay-per-crawl billing, and the x402 machine payment protocol built with Coinbase.
The stakes are substantial: the AI training data licensing market is projected to grow from $2.1 billion in 2024 to $15.2 billion by 2033. Cloudflare, which handles traffic for roughly 20% of the internet, is positioning itself as the tollbooth operator between AI companies and the content they need. Whether creators actually get paid—or whether this becomes another layer of infrastructure extracting fees—depends on whether AI companies accept these terms or find ways around them.
Images from Openverse under Creative Commons licenses.
Videos from YouTube.
Key Indicators
250:1
OpenAI crawl-to-referral ratio
For every user OpenAI sends to a website, it crawls that site 250 times
75%
Zero-click searches
Share of Google queries resolved without visiting the source site
$600M
x402 annualized volume
Machine-to-machine payments processed through the x402 protocol as of January 2026
1,500+
RSL endorsements
Media organizations, brands, and tech companies backing the Really Simple Licensing standard
People Involved
Matthew Prince
Co-Founder & CEO, Cloudflare (Leading Cloudflare's AI economy strategy)
DS
Dr. James Smith
Co-Founder & CEO, Human Native (Joining Cloudflare)
JG
Jack Galilee
Co-Founder & CTO, Human Native (Joining Cloudflare)
Organizations Involved
CL
Cloudflare, Inc.
Internet Infrastructure Company
Status: Building AI content monetization infrastructure
Internet infrastructure company that provides content delivery, security, and performance services for approximately 20% of websites globally.
HU
Human Native AI
AI Data Marketplace Startup
Status: Acquired by Cloudflare
London-based startup that built a two-sided marketplace connecting content owners with AI companies seeking licensed training data.
RS
RSL Collective
Nonprofit Standards Organization
Status: Managing Really Simple Licensing standard
Nonprofit organization developing machine-readable licensing standards for AI training data, modeled after music industry collective licensing.
Timeline
Cloudflare Acquires Human Native
Acquisition
Cloudflare announced the acquisition of Human Native, adding its AI data marketplace capabilities to Cloudflare's content monetization infrastructure. Terms were not disclosed.
RSL 1.0 Specification Finalized
Industry
The RSL Collective released the 1.0 specification, with endorsements from over 1,500 media organizations, brands, and technology companies.
Cloudflare and Coinbase Announce x402 Foundation
Partnership
Cloudflare and Coinbase announced they would co-found the x402 Foundation to establish a standard protocol for machine-to-machine payments using the HTTP 402 status code.
Cloudflare Introduces Content Signals Policy
Product
Cloudflare launched tools to help publishers update robots.txt with Content Signals, enabling fine-grained control over AI access including opt-out from AI overviews and inference.
RSL Collective Launches
Industry
RSS co-creator Eckart Walther and former Ask.com CEO Doug Leeds launched the RSL Collective, introducing Really Simple Licensing as a standard for expressing AI training data terms. Reddit, Yahoo, and Medium joined as founding supporters.
Cloudflare Begins Blocking AI Crawlers by Default
Product
Cloudflare started blocking AI crawler bots by default for customers, a significant move given the company's 20% share of internet traffic.
Meta Wins Fair Use Ruling
Legal
A California federal court granted summary judgment to Meta, ruling that its use of copyrighted books to train a generative AI tool qualified as fair use—one of the first such rulings.
US Copyright Office Rules Training Isn't Inherently Fair Use
Regulatory
The Copyright Office concluded that using copyrighted works to train generative models "does implicate copyright law" and that training is "not inherently transformative," especially when outputs compete with originals.
Cloudflare Releases AI Labyrinth
Product
Cloudflare launched AI Labyrinth, which detects unauthorized AI bots ignoring robots.txt and routes them into a maze of AI-generated decoy pages to waste their resources.
Human Native Hires Ex-Google, BBC Executives
Business
Human Native recruited Madhav Chinnappa (former Google news partnerships lead), Tim Palmer (Google product partnerships veteran), and Matt Hervey (IP law partner) to build out its team.
Cloudflare Launches AI Audit Beta
Product
Cloudflare released AI Audit, giving website owners visibility into which AI bots were crawling their content and how frequently.
OpenAI-Condé Nast Deal Announced
Business
Condé Nast licensed content from The New Yorker, Vogue, GQ, Wired, and other properties to OpenAI for use in ChatGPT and SearchGPT.
Human Native Raises £2.8M Seed Round
Funding
LocalGlobe and Mercuri led the seed investment in Human Native, just two months after the company's founding.
OpenAI Signs Deals with Vox Media and The Atlantic
Business
OpenAI announced licensing agreements with both publishers, gaining access to their content for AI training and ChatGPT integration in exchange for attribution and technology access.
Human Native Founded
Business
James Smith and Jack Galilee launched Human Native in London to build a marketplace connecting content owners with AI companies seeking licensed training data.
Reddit Announces $203M in Data Licensing Deals
Business
Days before its IPO filing, Reddit disclosed content licensing arrangements totaling $203 million over two to three years, including a $60 million annual deal with Google.
New York Times Sues OpenAI and Microsoft
Legal
The Times filed suit alleging copyright infringement, claiming its articles were used to train AI models without permission. The lawsuit seeks billions in damages and became a catalyst for industry-wide licensing negotiations.
Scenarios
1
Licensed Data Becomes Industry Standard
Discussed by: Analysts at Galaxy Digital and a]16z, coverage in Fortune and TechCrunch
AI companies accept that web scraping without payment carries legal and reputational risk, and shift toward licensed data sources. The x402 protocol and RSL standard gain adoption, creating measurable revenue flows to content creators. This follows the Napster-to-Spotify pattern: after years of litigation, a licensing regime emerges that's convenient enough for both sides. Galaxy Digital has forecasted x402 could reach 30% of Base daily transactions in 2026, while a]16z projects the protocol could capture $30 trillion in market share over five years if adoption continues.
2
AI Companies Route Around Infrastructure Tollbooths
Discussed by: ProMarket analysis, TorrentFreak coverage of EU licensing report
Major AI developers find alternatives to paying Cloudflare and similar gatekeepers—either through direct bilateral deals with large publishers, synthetic data generation that reduces dependence on web content, or infrastructure that bypasses the emerging tollbooth layer. Just as anti-piracy lawsuits in the Napster era "led to short-lived sales increases that quickly disappeared" before licensing took hold, the current enforcement mechanisms may prove similarly leaky. OpenAI and Anthropic already have direct deals with major publishers; they may prefer that model to paying infrastructure fees.
3
Legal Rulings Force Market Restructuring
Discussed by: McKool Smith AI litigation tracker, Copyright Alliance analysis, Debevoise legal review
The 50+ pending AI copyright lawsuits produce rulings that either definitively establish training as fair use (favoring AI companies) or reject it (forcing licensing). A Supreme Court decision or major appellate ruling could eliminate ambiguity that currently enables both scraping and licensing negotiations to coexist. The split rulings so far—two federal courts for fair use, one against, plus a German ruling against OpenAI—suggest the question remains genuinely open.
RSL, x402, Content Signals, and proprietary alternatives fail to converge, creating a fragmented landscape where compliance is expensive and adoption stalls. AI companies resist any standard implying payment obligations, while different publisher coalitions back competing approaches. The result is neither the open scraping of the past nor the licensed marketplace some envision, but a messy middle ground where only the largest players can navigate the complexity.
Historical Context
Napster and the Music Industry (1999-2003)
June 1999 - September 2003
What Happened
Napster launched in June 1999, enabling peer-to-peer music file sharing that reached 80 million users by 2001. The RIAA sued in December 1999, and artists including Metallica and Dr. Dre filed individual suits. Courts ordered Napster to block copyrighted material, effectively shutting it down by 2001. The company filed for bankruptcy in 2002.
Outcome
Short Term
Napster was destroyed, but file-sharing continued through successors like Kazaa, LimeWire, and BitTorrent. Music industry revenue fell from $14.6 billion in 1999 to $6.3 billion by 2009.
Long Term
iTunes (2003) and later Spotify (2008) created licensed alternatives convenient enough to compete with piracy. By 2020, streaming accounted for 83% of U.S. music revenue. The lesson: litigation alone failed; viable paid alternatives eventually succeeded.
Why It's Relevant Today
Human Native's CEO explicitly framed his company as helping AI "get out of its Napster era." The parallel is imperfect—AI training happens once rather than continuously—but the pattern of scraping-litigation-licensing may repeat.
Google Books Settlement Collapse (2005-2013)
December 2004 - November 2013
What Happened
Google began scanning millions of library books in 2004 to create a searchable index. Authors and publishers sued in 2005. Google proposed a $125 million settlement in 2008 that would have created a de facto licensing regime, with Google paying into a fund for authors. A revised settlement in 2009 would have covered "orphan works" whose rights holders couldn't be found.
Outcome
Short Term
Judge Denny Chin rejected the settlement in 2011, ruling it went too far by creating an "effective monopoly" through the orphan works provision. The case continued until 2013, when scanning-for-search was ruled fair use—but without a payment mechanism.
Long Term
No licensing regime emerged. Google won the legal battle but the envisioned digital library marketplace never materialized. The episode shows that even willing parties couldn't structure a comprehensive licensing solution when it required resolving millions of rights relationships.
Why It's Relevant Today
AI training faces similar scale challenges. Human Native and the RSL Collective are attempting to create licensing infrastructure that the Google Books settlement failed to establish—this time before courts foreclose options.
ASCAP/BMI Music Licensing Formation (1914-1941)
February 1914 - 1941
What Happened
In 1914, composers and publishers created ASCAP (American Society of Composers, Authors and Publishers) to collect royalties from venues playing their music—a task impossible to manage individually. Radio's rise in the 1920s created a new battleground: broadcasters wanted to play music freely, while rights holders demanded payment. After years of litigation and congressional hearings, BMI formed in 1939 as a competing collective, and standardized blanket licensing emerged.
Outcome
Short Term
Radio stations and venues paid blanket fees covering entire catalogs rather than negotiating song-by-song. This made compliance practical for users while ensuring some compensation reached creators.
Long Term
The ASCAP/BMI model became the template for collective licensing worldwide. Music licensing generates billions annually through a system that took decades to stabilize. The infrastructure persists today.
Why It's Relevant Today
The RSL Collective is explicitly modeled on ASCAP/BMI. Its founders aim to create a similar clearinghouse for AI training data—but whether AI companies will accept collective licensing, or whether the web's scale defeats such approaches, remains unresolved.