Nvidia's generational GPU leaps reshape who controls AI infrastructure

Overview

Nvidia has spent four years on an annual architecture cadence that no semiconductor company has sustained before. At GTC 2026, chief executive Jensen Huang unveiled the Vera Rubin platform—a system built around a single graphics processing unit that delivers 50 petaflops of inference compute, roughly five times the performance of its Blackwell predecessor, while claiming to cut the cost of generating each AI token by a factor of ten. In the same keynote, Huang launched NemoClaw, an open-source software platform that lets any company deploy autonomous AI agents across its operations without being locked into a specific cloud provider's hardware.

Key Indicators

50 PFLOPS

Rubin GPU inference performance

Each Rubin GPU delivers 50 petaflops of NVFP4 inference, a 3.3x to 5x improvement over Blackwell Ultra.

10x

Inference cost reduction claimed

Nvidia claims a tenfold reduction in per-token inference cost for mixture-of-experts models compared to Blackwell.

~85%

Nvidia's AI accelerator market share

Nvidia holds approximately 85 percent of the AI GPU market by revenue, though custom chips from hyperscalers are growing faster.

$215.9B

Nvidia fiscal 2026 revenue

Full-year revenue for fiscal year ending January 2026, up 65 percent year-over-year, driven almost entirely by data center sales.

1 GW

Thinking Machines Lab deployment

Nvidia committed at least one gigawatt of Vera Rubin systems to Mira Murati's AI startup under a multiyear partnership.

Interactive

Exploring all sides of a story is often best achieved with Play.

Ever wondered what historical figures would say about today's headlines?

Debate Arena

Two rounds, two personas, one winner. You set the crossfire.

People Involved

Jensen Huang

Delivered GTC 2026 keynote, unveiled Vera Rubin and NemoClaw

Mira Murati

Leading gigawatt-scale Vera Rubin deployment partnership with Nvidia

Organizations Involved

Nvidia Corporation

Strategic corporate investor

Dominant AI chip supplier, expanding into enterprise AI software

The company that designs and sells the GPUs powering the vast majority of the world's AI training and inference workloads.

Thinking Machines Lab

AI Research Startup (Public Benefit Corporation)

Nvidia's anchor customer for gigawatt-scale Vera Rubin deployment

A frontier AI lab building collaborative multimodal systems, backed by a $2 billion seed round and now one of the largest committed buyers of Nvidia's next-generation hardware.

Timeline

Vera Rubin, Rubin CPX, and NemoClaw launched at GTC 2026
Product Launch

Jensen Huang's keynote to 39,000 attendees officially debuted six new chips including the Rubin GPU (50 petaflops inference), Rubin CPX for massive-context processing, and NemoClaw—an open-source enterprise AI agent platform that works across any hardware vendor.
March 16th, 2026 • 11:00 PT
Nvidia and Thinking Machines Lab announce gigawatt partnership
Partnership

Nvidia committed at least one gigawatt of Vera Rubin systems to Thinking Machines Lab in a multiyear deal, alongside a significant equity investment in the startup.
March 10th, 2026
Nvidia reports $215.9 billion in fiscal 2026 revenue
Financial

Data center revenue reached $62.3 billion in the fourth quarter alone, up 75 percent year-over-year. The company guided for $78 billion in Q1 fiscal 2027.
February 25th, 2026
Thinking Machines Lab loses co-founders to OpenAI
Personnel

Co-founders Brett Zoph and Luke Metz returned to OpenAI, along with other senior researchers, creating a talent retention challenge for Murati's startup.
January 14th, 2026
Huang confirms Vera Rubin in full production at CES
Production Milestone

Nvidia announced that Vera Rubin NVL72 systems had entered full production, with specs upgraded to 288GB of HBM4 per GPU and 22 TB/s memory bandwidth—a 70 percent leap from earlier figures.
January 6th, 2026
Thinking Machines Lab raises $2 billion seed round
Funding

The startup closed one of the largest seed rounds in history at a $12 billion valuation, with Andreessen Horowitz, Accel, Nvidia, and AMD participating.
July 1st, 2025
Nvidia reveals Vera Rubin and Feynman roadmap at GTC 2025
Roadmap

Huang disclosed the Vera Rubin platform for 2026 and the Feynman architecture for 2028, committing to an annual cadence of generational GPU leaps.
March 17th, 2025
Thinking Machines Lab publicly launches
Company Launch

Murati announced a public benefit corporation focused on collaborative AI, quickly attracting investor interest from major venture firms and chipmakers.
February 18th, 2025
Mira Murati departs OpenAI
Personnel

OpenAI's chief technology officer left to pursue independent work, later founding Thinking Machines Lab.
September 25th, 2024
Blackwell architecture debuts at GTC 2024
Product Launch

Jensen Huang launched the Blackwell GPU with 208 billion transistors across two chiplets, doubling NVLink bandwidth and adding native support for sub-8-bit data types. Nvidia's first multi-chip GPU design.
March 18th, 2024
Nvidia announces Hopper architecture
Product Launch

Nvidia unveiled the H100 GPU based on the Hopper architecture, introducing FP8 precision and NVLink 4.0. Demand for H100s quickly outstripped supply, with lead times stretching past a year.
March 22nd, 2022

Scenarios

Nvidia holds 80%+ share through 2027 as Rubin locks in buyers

Discussed by: Wall Street consensus (38 of 39 analysts rate Nvidia a Strong Buy); Cantor Fitzgerald and Tigress Financial have price targets implying 50-97% upside

Vera Rubin's performance leap arrives before custom ASICs from Google, Amazon, and Microsoft can scale to match. Hyperscalers continue buying Nvidia GPUs for training while using custom chips only for narrower inference workloads. NemoClaw's open-source model builds a software moat similar to CUDA, making it costly for enterprises to switch hardware vendors. Nvidia's annual architecture cadence continues to outrun competitors' design cycles.

Custom chips erode Nvidia's inference market below 60% by 2028

Discussed by: Semi Analysis; analysts tracking custom ASIC growth rates (projected 44.6% shipment growth in 2026 vs. 16.1% for GPUs); Bloomberg Intelligence

Google's TPU fleet already handles over 75 percent of Gemini inference. Amazon's Trainium3 and Microsoft's Maia 200 are both shipping in volume. OpenAI has committed over $10 billion to Broadcom for its own custom silicon. If these programs scale as planned, the hyperscalers that account for most of Nvidia's revenue begin shifting inference workloads to cheaper in-house chips, pressuring Nvidia's margins even as training demand remains strong.

NemoClaw becomes the default enterprise AI agent framework

Discussed by: The New Stack; enterprise AI analysts covering the agentic AI market; CNBC reporting on pre-launch partnerships with Salesforce, Cisco, Google, Adobe, and CrowdStrike

If NemoClaw's open-source, vendor-neutral positioning gains traction, it could replicate CUDA's ecosystem lock-in at the software layer. Enterprises standardize on NemoClaw for AI agent deployment, creating indirect demand for Nvidia hardware even where custom chips could technically run the same workloads. This scenario transforms Nvidia from a hardware company into a platform company, with margins shifting from chip sales to ecosystem control.

AI infrastructure spending plateaus, delaying Rubin adoption

Discussed by: Motley Fool; market analysts noting Nvidia stock 11.5% below its October 2025 high; concerns about return on investment from hyperscaler AI capital expenditure

Combined hyperscaler capital expenditure is projected to approach $700 billion in 2026. If the revenue generated by AI products fails to justify this spending, buyers may extend the life of Blackwell systems rather than upgrade to Vera Rubin on schedule. A slowdown in AI infrastructure investment would compress Nvidia's growth rates even if market share holds steady.

Historical Context

Intel's Itanium and the x86 disruption that wasn't (2001-2012)

2001-2012

What Happened

Intel launched Itanium in 2001 as a clean-break replacement for x86 processors, expecting enterprise customers to migrate to the new architecture. AMD responded with x86-64, extending the existing architecture to handle 64-bit workloads without requiring customers to rewrite their software. Itanium ultimately sold fewer than one million units while x86-64 became the industry standard.

Outcome

Short Term

Intel spent billions developing Itanium while AMD captured server market share by maintaining backward compatibility.

Long Term

The episode demonstrated that software ecosystem lock-in matters more than raw hardware performance—customers chose the architecture that preserved their existing code investment.

Why It's Relevant Today

Nvidia's CUDA ecosystem and now NemoClaw follow the same logic: performance leadership matters, but the software layer that keeps developers from switching may be the more durable competitive advantage against custom ASICs.

Qualcomm's baseband modem dominance and Apple's breakaway attempt (2017-present)

2017-present

What Happened

Apple spent years trying to replace Qualcomm's modem chips with in-house designs, partnering with Intel and later developing its own cellular modems. Despite multi-billion-dollar investment, Apple repeatedly delayed its custom modem rollout due to the difficulty of matching Qualcomm's integrated performance across cellular standards. Qualcomm maintained over 50 percent market share throughout.

Outcome

Short Term

Apple's Intel-sourced modems underperformed, and the company settled a bitter patent dispute with Qualcomm in 2019.

Long Term

The case showed that even the world's most valuable company found it extremely difficult to replicate a dominant chipmaker's performance—though Apple eventually shipped its first in-house modem in 2025.

Why It's Relevant Today

Hyperscalers building custom AI chips face a similar challenge: Nvidia's integrated hardware-software stack is difficult to replicate, but well-resourced companies with enough volume may eventually succeed at narrower workloads like inference.

Amazon Web Services creates the cloud computing market (2006-2010)

2006-2010

What Happened

Amazon launched Elastic Compute Cloud in 2006, offering commodity server access on demand. Traditional hardware vendors like Sun Microsystems and HP initially dismissed the model, arguing enterprises would always want to own their own iron. By 2010, AWS had established a platform ecosystem—storage, databases, networking—that made switching costs high even though the underlying hardware was generic.

Outcome

Short Term

AWS grew from zero to $1 billion in revenue in roughly four years while incumbents scrambled to respond.

Long Term

The platform layer, not the hardware, became the defensible business. Sun Microsystems was acquired by Oracle in 2010 for a fraction of its peak value.

Why It's Relevant Today

NemoClaw signals that Nvidia is learning from the cloud playbook: hardware cycles come and go, but the platform layer that enterprises build their workflows on creates switching costs that outlast any single chip generation.