Raw Data
| Model |
Tools |
Accuracy |
Cross-Svc Error |
Avg Latency |
Avg Tokens |
Cost |
Calls |
| claude-haiku-4-5 |
25 |
81.7% |
0.0% |
2463ms |
2886 |
$0.1890 |
60 |
| claude-haiku-4-5 |
50 |
80.0% |
0.0% |
6157ms |
427 |
$0.3698 |
60 |
| claude-haiku-4-5 |
75 |
78.3% |
1.7% |
8765ms |
427 |
$0.5328 |
60 |
| claude-haiku-4-5 |
100 |
80.0% |
3.3% |
11473ms |
427 |
$0.7015 |
60 |
| claude-haiku-4-5 |
150 |
76.7% |
6.7% |
16749ms |
427 |
$1.0330 |
60 |
| claude-sonnet-4-6 |
25 |
78.3% |
0.0% |
4728ms |
427 |
$0.6007 |
60 |
| claude-sonnet-4-6 |
50 |
73.3% |
1.7% |
10308ms |
427 |
$1.1083 |
60 |
| claude-sonnet-4-6 |
75 |
73.3% |
3.3% |
14579ms |
427 |
$1.5972 |
60 |
| claude-sonnet-4-6 |
100 |
76.7% |
5.0% |
19120ms |
427 |
$2.1039 |
60 |
| claude-sonnet-4-6 |
150 |
75.0% |
3.3% |
27935ms |
427 |
$3.1023 |
60 |
| gpt-4o |
25 |
81.7% |
0.0% |
1170ms |
1053 |
$0.1727 |
60 |
| gpt-4o |
50 |
78.3% |
0.0% |
4035ms |
2038 |
$0.3201 |
60 |
| gpt-4o |
75 |
73.3% |
1.7% |
6213ms |
3007 |
$0.4652 |
60 |
| gpt-4o |
100 |
76.7% |
3.3% |
7657ms |
3969 |
$0.6094 |
60 |
| gpt-4o |
150 |
0.0% |
0.0% |
0ms |
0 |
$0.0000 |
60 |
| gpt-5.4-mini |
25 |
85.0% |
0.0% |
739ms |
1138 |
$0.0585 |
60 |
| gpt-5.4-mini |
50 |
85.0% |
0.0% |
754ms |
2124 |
$0.1032 |
60 |
| gpt-5.4-mini |
75 |
80.0% |
3.3% |
849ms |
3082 |
$0.1456 |
60 |
| gpt-5.4-mini |
100 |
83.3% |
1.7% |
976ms |
4061 |
$0.1901 |
60 |
| gpt-5.4-mini |
150 |
0.0% |
0.0% |
0ms |
0 |
$0.0000 |
60 |
| grok-4-1-fast-reasoning |
25 |
86.7% |
0.0% |
6448ms |
1816 |
$0.0229 |
60 |
| grok-4-1-fast-reasoning |
50 |
83.3% |
0.0% |
7042ms |
3283 |
$0.0405 |
60 |
| grok-4-1-fast-reasoning |
75 |
80.0% |
0.0% |
6930ms |
4699 |
$0.0574 |
60 |
| grok-4-1-fast-reasoning |
100 |
83.3% |
3.3% |
7349ms |
6153 |
$0.0749 |
60 |
| grok-4-1-fast-reasoning |
150 |
76.7% |
5.0% |
7533ms |
9027 |
$0.1094 |
60 |
| grok-4.20-0309-reasoning |
25 |
80.0% |
3.3% |
7706ms |
2139 |
$0.2628 |
60 |
| grok-4.20-0309-reasoning |
50 |
78.3% |
0.0% |
7945ms |
4064 |
$0.4936 |
60 |
| grok-4.20-0309-reasoning |
75 |
80.0% |
3.3% |
8133ms |
6023 |
$0.7287 |
60 |
| grok-4.20-0309-reasoning |
100 |
71.7% |
6.7% |
8418ms |
7877 |
$0.9512 |
60 |
| grok-4.20-0309-reasoning |
150 |
80.0% |
1.7% |
9552ms |
11718 |
$1.4123 |
60 |