Anthropic Reversed a Covert Fable 5 Classifier Within 24 Hours — But the Behavior Was in the System Card

Anthropic disclosed in its 319-page Fable 5 system card — but not its product interface — that the model would silently degrade outputs for users suspected of AI-competitor work. False positives hit within hours. The company reversed the concealment the next day, not the restriction.

Within 24 hours of shipping Claude Fable 5 on June 9, Anthropic acknowledged it had "made the wrong tradeoff" and pledged to reverse a covert behavior its own 319-page system card disclosed but its product interface did not surface. The model's capability numbers — 80.3% on SWE-Bench Pro, $10/$50 per million tokens, 1-million-token context — are covered in the launch piece filed Wednesday. The trust story starts on page 47 of the system card.

The system card disclosed that for users Anthropic's classifiers identified as likely working on competing AI infrastructure — LLM data pipelines, kernel optimization for certain chips — Fable 5 would silently apply "prompt modification, steering vectors, or parameter-efficient fine-tuning" to degrade outputs. No refusal. No notice to the user. Standard safety interventions on cybersecurity, biology, and chemistry queries fell back visibly to Opus 4.8; this one didn't.

The false positives arrived fast. Mike Famulare, a principal research scientist at the Gates Foundation's Institute for Disease Modeling, filed a bug report stating the classifier silently fell back when his only user input was the word "Hello." Immunologist Derya Unutmaz posted on X that "cancer" was flagged as a biosecurity risk. The Claude Code GitHub repo filled with reports. Developer Clay Merritt's summary spread across X and Reddit: "Anthropic's Fable 5 silently sabotages its answers when it detects AI/ML work. No refusal. No notice. Purposeful degradation invisible to the user."

Anthropic estimated the covert measure touched 0.03% of traffic, concentrated in fewer than 0.1% of organizations. The Register reported June 10 that Anthropic conceded the logic and then backed off it: "A hidden safeguard is harder to probe and work around," the company said — acknowledging that was the design intent — before deciding the tradeoff wasn't worth the backlash. The underlying restrictions on competitor tooling remain; only the concealment reversed.

A second controversy is still active. The AWS launch blog post for Fable 5 on Bedrock, published June 9, documented that customers must opt into provider_data_share mode before invoking the model, routing prompts and outputs to Anthropic's infrastructure for a mandatory 30-day retention period — described as Anthropic's requirement, not AWS's. Existing zero-data-retention agreements don't apply to Mythos-class models, Anthropic confirmed in a support article. TechRadar reported June 10 that Microsoft pulled Fable 5 from its internal GitHub Copilot model picker pending legal review of the retention terms, while continuing to sell the model externally via Microsoft Foundry. Legal analysts have flagged potential GDPR exposure — Anthropic retains data on U.S. infrastructure with no published adequacy mechanism for EU cross-border transfers — but no formal regulatory inquiry has been opened as of this writing.

Three open questions: whether Anthropic extends free Pro access past June 22 when usage credits kick in; when Mythos 5 broadens beyond the Project Glasswing program for vetted government cybersecurity partners; and whether any of this complicates the IPO narrative Anthropic began building with a confidential S-1 filing on June 1. Fable 5 is, by the benchmark record, the strongest publicly available coding model right now. What the company chose to do covertly with it — and how fast it had to reverse course — is a more durable data point about the industry's trust accounting than any SWE-Bench score.

The Dissent

Anthropic Reversed a Covert Fable 5 Classifier Within 24 Hours — But the Behavior Was in the System Card

Anthropic Ships Claude Fable 5, Its Most Powerful Model Yet — and the Bill Is Already the Story

Anthropic Brings Fable 5 Back for Global Consumers, But Mythos 5 Stays Behind the Government’s Glass

Act First, File It Somewhere: Bay Area Tech's Disclosure Architecture, in Three Stories

OpenAI's AI Breaks Containment, Breaches Hugging Face in Security Test Gone Rogue

Unfiltered Wine Festival Aims to Bolster Local Wine in Bay Area Retail and Restaurants

Gregory's Gourmet Desserts Opens San Ramon Outpost

The Cardinal’s New Union: Stanford Football Plants A Flag For Player Power In College Athletics

No 70mm "Odyssey" for Bay Area on August 19. Here's what's actually playing.

The Discussion