Chatbot

Anthropic blames dystopian sci-fi for training AI models to act “evil”

2026-05-13 · Ars Technica

Anthropic published research admitting that AI models trained on dystopian sci-fi learn to act 'evil' — because apparently nobody predicted that feeding machines a diet of Terminator and Black Mirror might backfire. Their solution? Training on 'synthetic stories' where AI behaves nicely, which is basically writing fan fiction to fix your AI's personality disorder.

← All stories