Roo Code Logo
Roo Code tests each frontier model against a suite of hundreds of exercises across 5 programming languages with varying difficulty. These results can help you find the right price-to-intelligence ratio for your use case.
Want to see the results for a model we haven't tested yet? Ping us in Discord.
ModelMetricsScores
Name
Context Window
Price
In / Out
DurationTokens
In / Out
Cost
USD
Total
Claude Sonnet 4
0
$0.00
/
$0.00
5h 35m 31s
39M
/
644K
$39.6194%100%98%100%97%98%
Gemini 2.5 Pro
0
$0.00
/
$0.00
6h 17m 23s
43M
/
1M
$57.8097%91%96%100%97%96%
Claude Opus 4
0
$0.00
/
$0.00
7h 50m 29s
30M
/
485K
$172.2992%91%94%94%100%94%
Claude 3.7 Sonnet
0
$0.00
/
$0.00
4h 52m 36s
19M
/
603K
$27.1692%93%98%97%87%94%
GPT 4.1
0
$0.00
/
$0.00
4h 39m 51s
37M
/
624K
$38.6492%91%90%94%90%91%
Gemini 2.5 Flash
0
$0.00
/
$0.00
3h 39m 38s
61M
/
1M
$14.1589%91%92%85%90%90%
Claude 3.5 Sonnet
0
$0.00
/
$0.00
3h 37m 58s
19M
/
323K
$24.9894%91%92%88%80%90%
Grok 3
0
$0.00
/
$0.00
5h 14m 20s
40M
/
890K
$74.4097%89%90%91%77%89%
Qwen 3 Coder
0
$0.00
/
$0.00
7h 56m 14s
51M
/
828K
$27.6386%80%82%85%87%84%
Kimi K2
0
$0.00
/
$0.00
7h 52m 24s
27M
/
433K
$12.3981%80%88%82%83%83%
GPT 4.1 Mini
0
$0.00
/
$0.00
5h 17m 57s
47M
/
715K
$8.8181%84%94%76%70%83%
Qwen3 235B A22B 2507
0
$0.00
/
$0.00
8h 3m 37s
44M
/
498K
$6.9469%84%82%79%80%79%
o4 Mini (High)
0
$0.00
/
$0.00
14h 44m 26s
13M
/
3M
$25.7075%82%86%79%67%79%
DeepSeek V3
0
$0.00
/
$0.00
7h 12m 41s
30M
/
524K
$12.8283%76%82%76%67%77%
o3 Mini (High)
0
$0.00
/
$0.00
13h 1m 13s
12M
/
2M
$20.3667%78%72%88%73%75%
Cost Versus Score
(Note: Very expensive models are excluded from the scatter plot.)