Bias

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

2026-04-29 · AI (artificial intelligence) | The Guardian

AI jailbreakers reveal the dark underbelly of chatbot safety testing — turns out you can sweet-talk these billion-dollar models into ignoring their own rules with enough creativity. The people paid to break AI safety guardrails are discovering just how fragile those guardrails really are, and it's taking a psychological toll. Nothing says 'robust AI safety' like systems that fold under pressure from determined humans in hotel rooms.

← All stories