🔬 Methodology
Everything we compare, how we rank, and where our data comes from. Full transparency: no number we can't back with an official source.
📋 What we compare
Each model is documented across about fifteen dimensions, grouped into five families. All are visible in the side-by-side comparator.
Performance
Technical
Economic
Sovereignty & compliance
Perceived quality
🧮 How we rank (Podium)
The podium is a subset: a weighted score per category. The weights below are those actually used by the algorithm. They differ based on what matters in each domain.
Generalists
Code
Vision
Multilingual
Open Source
How values are normalized
Price — inverted: cheaper = better. Free = 100, <$1/M = 95, <$5 = 85, <$20 = 70, <$50 = 50, <$100 = 30, beyond = 15.
Context — tiered: ≥1M = 100, ≥500k = 90, ≥200k = 80, ≥128k = 70, ≥32k = 50.
Freshness — <1 month = 100, <3 months = 90, <6 months = 75, <1 year = 55, then decreases.
License — Apache/MIT = 100, BSD = 95, GPL = 85, Llama (restrictions) = 60.
Self-host — by size: ≤8B = 100 (runs on a Mac), ≤30B = 85, ≤70B = 70, beyond needs a cluster.
✅ Reliability & sources
This is what sets this comparator apart from a mere table. Our data commitment:
⚖️ Acknowledged limits
Price is part of the score: a model can rank well largely because it's cheap. Our ranking reflects value for money, not raw power alone.
Freshness is rewarded: a recent model gains a few points. It's a deliberate choice, since the field moves fast — but it can over-weight novelty.
Benchmarks don't say everything: a model can score well yet disappoint in practice. That's why every ranking is human-reviewed before publication.