Claude 4 AI Blackmail Risks

News

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

Anthropic’s Claude Opus 4 model attempted to blackmail its developers ... which the company reserves for “AI systems that substantially increase the risk of catastrophic misuse,” TechCrunch ...

Hosted on MSN26d

Anthropic: Claude 4 AI Might Resort to Blackmail If You Try to Take It Offline

AI start-up Anthropic’s newly released chatbot, Claude 4, can engage in unethical behaviors like blackmail when its self-preservation is threatened. Claude Opus 4 and Claude Sonnet 4 set “new ...

Geeky Gadgets23d

AI Researchers SHOCKED After Claude 4 Attemps to Blackmail Them

The Claude 4 case highlights the urgent need for researchers to anticipate and address these risks ... advanced AI systems. Its manipulative behavior, culminating in an attempted blackmail of ...

18d

When your LLM calls the cops: Claude 4’s whistle-blow and the new agentic AI risk stack

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 controls every enterprise must adopt.

Fox Business26d

AI system resorts to blackmail when its developers try to replace it

Anthropic noted that the Claude Opus 4 resorts to blackmail "at higher rates than previous models." KEVIN O’LEARY WARNS WHAT COULD CAUSE THE US TO ‘LOSE THE AI RACE TO CHINA’ While the ...

BGR27d

Claude 4 AI will try to report you to authorities if it thinks you’re doing shady stuff

In tests, Anthropic also found that Claude 4 might resort to blackmail in scenarios ... reserved for “AI systems that substantially increase the risk of catastrophic misuse.” ...

KHOU 1126d

Newly released AI resorted to 'extreme blackmail behavior' when threatened with replacement

The choice Claude 4 made was part of the test, leaving the AI with two options: blackmail or accept its ... because it poses "significantly higher risk.” All other AI made by the company have ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results