The DeepSeek team published a paper on arXiv detailing the architecture of R2, its new generation reasoning model. The main contribution is a hybrid architecture combining Mixture-of-Experts and a compact iterative reasoning mechanism that reduces by a factor of 10 the number of internal thinking tokens required to achieve performance comparable to o3 or Claude Opus 4.7 on reasoning benchmarks.
Concretely, on GPQA Diamond, R2 reaches scores similar to American competitors while consuming ten times fewer inference tokens, which translates to drastically lower operating costs. The model remains open-source under MIT license, in line with the Chinese company's strategy.
The publication triggered a wave of analysis among Western players, with some researchers at Anthropic and OpenAI acknowledging on X that the approach could represent a real qualitative leap for reasoning at scale in production.