The Company Built to Prove Self-Regulation Works Just Proved It Doesn't

Written by David Klemme | Feb 26, 2026 11:35:14 AM

Anthropic was founded in 2021 by researchers who left OpenAI over concerns that safety was being sacrificed for speed. Their Responsible Scaling Policy included a pledge that most of the industry considered radical: if a model crossed dangerous capability thresholds, Anthropic would stop. Not slow down. Stop.

That pledge no longer exists.

Eighteen Days

The timeline is worth reading slowly.

February 9. Mrinank Sharma, head of the Safeguards Research Team, resigns. In a public letter, he writes that "the world is in peril" and reflects on "how hard it is to truly let our values govern our actions" within institutions shaped by competition, speed, and scale. He leaves to study poetry.

February 12. Anthropic closes a $30 billion funding round at a $380 billion post-money valuation. The second-largest private financing round in history.

February 24. Anthropic publishes Version 3.0 of its Responsible Scaling Policy. The commitment to halt training when safety mitigations aren't in place is gone. Chief Science Officer Jared Kaplan tells Time Magazine: "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments if competitors are blazing ahead."

February 24, same day. Defence Secretary Pete Hegseth summons CEO Dario Amodei to the Pentagon. The ultimatum: allow unrestricted military use of Anthropic's models for all legal purposes by Friday at 5:01 p.m., or the administration will invoke the Defence Production Act and label Anthropic a supply chain risk. Reported independently by the New York Times, Reuters, NBC, and CBS.

The company created to prove that self-regulation works became, in eighteen days, the definitive case study for why it doesn't.

The Inevitability Trap

The instinct is to look for a villain. There isn't one.

Anthropic's investors deployed $30 billion at a $380 billion valuation. At that scale, pausing development isn't a safety decision, it actually becomes a fiduciary liability. The money demands growth, market share, government contracts.

The Pentagon awarded Anthropic a $200 million contract to develop AI for national security. The military's logic is straightforward: if you're providing classified capabilities, you don't get to dictate how they're used. OpenAI, Google, and xAI received the same contracts without usage restrictions. Anthropic was the outlier.

Kaplan's reasoning is internally coherent. If Anthropic pauses and every competitor continues, the result isn't a safer world. It's a world where the same capabilities exist but Anthropic has no seat at the table. The voluntary pause only works if everyone pauses. Nobody else was going to.

And Sharma's departure is the logical response from someone who sees where all of this leads and decides integrity requires stepping off.

Every actor in this story behaved rationally within their own incentive structure. That's the point. The system produced the outcome. This was not a decision, not a failure of character, nor a scandal. It's a structural inevitability.

Safety Is Not a Feature

Anthropic tried something genuinely novel: making safety a competitive differentiator. "Choose us because we're the responsible ones." For a while, it worked. It attracted talent, research partnerships, and clients who cared about governance.

But differentiators are subject to market forces. The moment safety became a competitive liability (Pentagon contracts lost, deployment speed constrained, investor pressure mounting), the differentiator got rationalised away. Not overnight. Gradually, with well-reasoned justifications at every step. That's how it always works.

This is the category error at the heart of the voluntary AI safety movement: treating safety as a feature rather than as infrastructure. Features get shipped, modified, or deprecated based on market dynamics. Infrastructure is mandatory. Nobody "competes" on whether their building has fire exits. Nobody drops structural load requirements because a competitor built cheaper without them.

The Responsible Scaling Policy was a feature. It was marketed as infrastructure. February 2026 exposed the difference.

What Actually Works

The EU AI Act treats AI safety as infrastructure, not a feature. Article 9 risk management requirements, conformity assessments, and mandatory obligations don't evaporate because a competitor moves faster or a government agency applies pressure. That's by design.

The transatlantic divergence is now visible in a way it wasn't before. The US government used the Defence Production Act, a wartime industrial compulsion law, to strip AI safety constraints from a private company. The EU built a legal framework to enforce them. These are fundamentally different theories of governance, and any organisation operating across both jurisdictions needs to understand what that means for their risk posture.

But the practical lesson is operational, not geopolitical.

If "Anthropic's Responsible Scaling Policy" appeared anywhere in your AI risk assessment as a mitigation measure, that mitigation was invalidated on February 24th. Not because Anthropic acted in bad faith, but because a corporate policy document can be revised at any board meeting. It was never a control. It was a statement of intent, and intent changed when the incentives did.

Contractual safeguards. Independent audit rights. Exit clauses. Regulatory compliance obligations. These are load-bearing elements of an AI governance framework. A vendor's internal safety policy is not. It's paint on the wall, not a structural beam.

The Hypothesis, Falsified

Anthropic was the best-case scenario for voluntary AI self-regulation. Well-funded, mission-driven, founded specifically to solve this problem, staffed with researchers who genuinely cared. If any company could sustain voluntary safety commitments against market gravity, it was this one.

They couldn't. Not because they didn't try. Because the structure of AI competition, venture capital, and state power makes voluntary commitments fundamentally unsustainable once the stakes get high enough.

The hypothesis that a private company can self-regulate frontier AI has been empirically tested, and the results are in.

Governance that depends on goodwill isn't governance, but a bet. And February 2026 is what it looks like when that bet doesn't pay.

Sources

Anthropic, "Responsible Scaling Policy Version 3.0" (February 24, 2026): anthropic.com
Barnes, J. and Frenkel, S., "Pentagon Gives A.I. Company an Ultimatum," New York Times (February 24, 2026): nytimes.com
Reuters, "Anthropic digs in heels in dispute with Pentagon" (February 24, 2026): reuters.com
CBS News, "What's behind the Anthropic-Pentagon feud" (February 25, 2026): cbsnews.com
NBC News, "Anthropic offered Pentagon ability to use AI systems for missile defense" (February 25, 2026): nbcnews.com
Fortune, "Hegseth issues an ultimatum to 'woke AI' startup Anthropic" (February 25, 2026): fortune.com
Sharma, M., resignation letter posted on X (February 9, 2026)
CNBC, "Anthropic closes $30 billion funding round at $380 billion valuation" (February 12, 2026): cnbc.com
Hendrix, J., "A Timeline of the Anthropic-Pentagon Dispute," TechPolicy.Press (February 25, 2026): techpolicy.press

View full post