Anthropic says some Claude fashions can now finish ‘dangerous or abusive’ conversations
Anthropic has announced new capabilities that can enable a few of its latest, largest fashions to finish conversations in what the corporate describes as “uncommon, excessive instances of persistently dangerous or abusive person interactions.” Strikingly, Anthropic says it’s doing this to not shield the human person, however quite the AI mannequin itself.
To be clear, the corporate isn’t claiming that its Claude AI fashions are sentient or might be harmed by their conversations with customers. In its personal phrases, Anthropic stays “extremely unsure concerning the potential ethical standing of Claude and different LLMs, now or sooner or later.”
Nonetheless, its announcement factors to a recent program created to study what it calls “model welfare” and says Anthropic is actually taking a just-in-case method, “working to determine and implement low-cost interventions to mitigate dangers to mannequin welfare, in case such welfare is feasible.”
This newest change is at the moment restricted to Claude Opus 4 and 4.1. And once more, it’s solely speculated to occur in “excessive edge instances,” equivalent to “requests from customers for sexual content material involving minors and makes an attempt to solicit info that may allow large-scale violence or acts of terror.”
Whereas these forms of requests may doubtlessly create authorized or publicity issues for Anthropic itself (witness latest reporting round how ChatGPT can potentially reinforce or contribute to its users’ delusional thinking), the corporate says that in pre-deployment testing, Claude Opus 4 confirmed a “sturdy choice towards” responding to those requests and a “sample of obvious misery” when it did so.
As for these new conversation-ending capabilities, the corporate says, “In all instances, Claude is barely to make use of its conversation-ending means as a final resort when a number of makes an attempt at redirection have failed and hope of a productive interplay has been exhausted, or when a person explicitly asks Claude to finish a chat.”
Anthropic additionally says Claude has been “directed to not use this means in instances the place customers is likely to be at imminent threat of harming themselves or others.”
Techcrunch occasion
San Francisco
|
October 27-29, 2025
When Claude does finish a dialog, Anthropic says customers will nonetheless be capable of begin new conversations from the identical account, and to create new branches of the troublesome dialog by modifying their responses.
“We’re treating this characteristic as an ongoing experiment and can proceed refining our method,” the corporate says.
Source link
latest video
latest pick

news via inbox
Nulla turp dis cursus. Integer liberos euismod pretium faucibua