Anthropic officially released Claude Opus 4.7 yesterday, its most advanced model to date. The model introduces a new extended reasoning mode that lets the system spend more internal tokens on deliberation before producing its answer, with notable gains on the most demanding technical benchmarks.
On SWE-bench Verified, the reference benchmark for real-world software engineering tasks, Opus 4.7 reaches 72.8%, a significant jump from version 4.6. On GPQA Diamond, which evaluates PhD-level scientific reasoning, the score climbs to 73.5%. These results put the model at the top across several categories, ahead of GPT-5 and Gemini 3 Pro.
The context window stays at 200k tokens, a deliberate choice by Anthropic to prioritize reasoning quality over raw length. API pricing is set at 15 dollars per million input tokens and 75 for output, unchanged from the previous generation.