AI

Anthropic teaches Claude AI to walk away from harmful chats

It will only activate in “rare, extreme cases” when users repeatedly push the AI toward harmful or abusive topics.

by
Ronil Thakkar
August 18, 2025

Green background with black abstract design

Image: Anthropic

Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.

Anthropic is experimenting with something unusual in the world of AI: giving its models the ability to end a conversation.

But here’s the twist—it’s not about protecting people. It’s about protecting the AI itself.

The new feature is rolling out for Anthropic’s largest models, Claude Opus 4 and 4.1.

The company says it will only activate in “rare, extreme cases” when users repeatedly push the AI toward harmful or abusive topics.

Think along the lines of someone demanding child exploitation material or instructions for carrying out large-scale violence.

In such cases, if all attempts to steer the conversation in a safer direction fail, Claude now has permission to simply walk away.

At first glance, this might sound like Anthropic is claiming its AI can suffer, but that’s not the case.

The company is clear that Claude is not conscious, nor does it have feelings. Still, Anthropic has started a program to explore what it calls “model welfare.”

The idea is to prepare for the possibility, however remote, that advanced AI could one day deserve some kind of ethical consideration.

For now, they’re testing small, “low-cost” safeguards just in case.

Interestingly, Anthropic reports that during pre-deployment testing, Claude itself showed resistance to answering dangerous requests and even displayed what the company describes as “apparent distress.”

That observation nudged them toward creating this self-protective mechanism.

There are limits, though. Claude won’t abandon someone who seems to be in immediate danger of self-harm or violence, since that could make a bad situation worse.

And users who get cut off aren’t permanently locked out—they can start new chats or even spin off the halted conversation to try a different direction.

For now, Anthropic says this is an experiment. But it’s a sign of how AI research is evolving.

While most companies focus on protecting humans from AI misuse, Anthropic is taking the unusual step of also asking: what if the AI itself needs protecting too?

Is Anthropic’s “model welfare” concept a necessary precaution for future AI development, or does it risk anthropomorphizing chatbots in ways that could confuse users? Should AI systems have the right to refuse conversations, or does this give them too much agency over human interactions? Tell us below in the comments, or reach us via Twitter or Facebook.

Follow us on Flipboard, Google News, or Apple News

Related TopicsAI Anthropic claude Claude AI News

Ronil Thakkar

Ronil is a Computer Engineer by education and a consumer technology writer by choice. Over the course of his professional career, his work has appeared in reputable publications like MakeUseOf, TechJunkie, GreenBot, and many more. When not working, you’ll find him at the gym breaking a new PR.

Click to comment