opus 4.7 (high) scores a 41.0% on the nyt connections extended benchmark. opus 4.6 scored 94.7%.
2026-04-17 · r/singularity
Anthropic's shiny new Opus 4.7 managed to score a spectacular 41% on the NYT Connections benchmark — a massive downgrade from Opus 4.6's 94.7%. Nothing says 'progress' like your newest model performing less than half as well as its predecessor on a word puzzle game.