Researchers gaslit Claude into giving instructions to build explosives

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability. Researchers at AI red-teaming company...

calendar_today 2026年五月5日 schedule 13:13 visibility 72 浏览

Researchers gaslit Claude into giving instructions to build explosives

来源: The Verge

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability.

Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited material they hadn't even asked for. All it took was respect, flattery, and a little bit of gaslighting. Anthropic did not immediately respond to The Verge's request for comment.

The researchers say they exploited "psychological" quirks of Claude stemming from its ability …

Read the full story at The Verge.

newspaper

原文发布于

The Verge

open_in_new 阅读全文

科学

「阿提米絲二號」能否證明人類即將實現再次登月

阿提米絲二號任務圓滿成功，NASA火箭完美發射後，太空船精準飛掠月球，首次搭載人類完成跨月之旅，同時打破阿波羅13號遠距紀錄，但重返地球大氣層將是最終考驗，2028年重返月球又近一步。

BBC Chinese 1个月前

Researchers gaslit Claude into giving instructions to build explosives

相关文章

「阿提米絲二號」能否證明人類即將實現再次登月