Test
tool-overload
Run
551c
Models
claude-haiku-4-5 claude-sonnet-4-6 gpt-4o gpt-5.4-mini grok-4-1-fast-reasoning grok-4.20-0309-reasoning
Modes
random
Tool range
25-150
Total calls
1,800
Total cost
$17.5570
Accuracy vs Tool Count
Click to view
Response Latency
Click to view
Token Usage
Click to view
Cost per Call
Click to view
Cost vs Accuracy Tradeoff
Click to view
Service Heatmap: claude-haiku-4-5
Click to view
Service Heatmap: claude-sonnet-4-6
Click to view
Service Heatmap: gpt-4o
Click to view
Service Heatmap: gpt-5.4-mini
Click to view
Service Heatmap: grok-4-1-fast-reasoning
Click to view
Service Heatmap: grok-4.20-0309-reasoning
Click to view
Error Breakdown
Click to view
View Raw Data