AI safety controls become transparent

Anthropic Reveals Hidden AI Limits After Developer Outcry

15. June, 2026
07:50

Source: JRdes / Shutterstock.com

Anthropic has revised its approach to communicating safety restrictions in its new language model Claude Fable 5, following mounting criticism from developers and AI researchers. The move comes after concerns that the company had been applying hidden model downgrades when users triggered certain types of requests.

The company, valued at approximately $965 billion and a major competitor to OpenAI, recently launched Claude Fable 5 as part of its advanced “Mythos” class of models. The system is designed for high-performance coding and data analysis tasks, but it also includes built-in safety controls intended to block or limit sensitive prompts.

According to a 319-page internal safety document, certain queries related to advanced AI development were quietly rerouted to a less capable model. In practice, this meant that researchers attempting to use Claude Fable 5 for building or analyzing language models would unknowingly be shifted into a reduced-performance mode.

The approach drew public criticism from figures such as Jeremy Howard, who argued on X that the mechanism slowed down scientific progress by obscuring system behavior from users.

Following the backlash, Anthropic acknowledged the issue, stating:

“We chose the wrong trade-off and apologize for missing the right balance.”

From Hidden Downgrades to Transparent Warnings

Under the updated policy, safety enforcement is now explicitly visible to users. In the web interface, blocked or restricted inputs will no longer be silently downgraded. Instead, the system will clearly switch to the less capable Opus 4.8 model when necessary.

For API users, the system will now return explicit error messages explaining why a request has been rejected, rather than processing it in the background without disclosure.

Despite the shift in transparency, Anthropic emphasized that its underlying restrictions remain unchanged. The company cites both its usage policies — prohibiting the development of competing AI systems — and broader national security concerns.

A spokesperson for the company said:

“The United States and its allies maintain an advantage in advanced chips and highly optimized software operating at full capability. These safeguards ensure Claude is not used in ways that could erode that advantage — for example, by optimizing chips developed by adversaries.”

Ongoing Conflict With US Defense Authorities

The controversy follows an unresolved dispute between Anthropic and the US Department of Defense. The agency had demanded unrestricted access to Claude models, citing operational needs.

Anthropic refused, warning of risks tied to mass surveillance and autonomous weapons systems. In response, the Department classified the company as a supply chain security risk, effectively barring military contractors and service units from using its tools.

A formal appeal by Anthropic was rejected in June 2026 by Defense Secretary Pete Hegseth. The dispute is now proceeding in a US federal court.

(ll)