Chatbot

opus 4.7 (high) scores a 41.0% on the nyt connections extended benchmark. opus 4.6 scored 94.7%.

2026-04-17 · r/singularity

Anthropic's shiny new Opus 4.7 managed to score a spectacular 41% on the NYT Connections benchmark — a massive downgrade from Opus 4.6's 94.7%. Nothing says 'progress' like your newest model performing less than half as well as its predecessor on a word puzzle game.

← All stories