Complexity Routing
What you'll accomplish: Route requests to different model tiers based on complexity, balancing cost and quality.
How it works
When [routing] is configured, requests are scored and routed to different model tiers:
[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500
Scoring
The complexity score is computed from:
- Message length (character count)
- Number of tools available
- Requested max_tokens
| Score | Tier | Routed to |
|---|---|---|
< simple_threshold | Simple | simple_model |
≥ simple_threshold and < complex_threshold | Medium | medium_model |
≥ complex_threshold | Complex | complex_model |
Practical examples
Cost optimization
Use fast, cheap models for simple queries and expensive models for complex reasoning:
[routing]
simple_model = "groq/llama-3.1-8b-instant" # Fast, cheap
medium_model = "anthropic/claude-sonnet-4-20250514" # Balanced
complex_model = "anthropic/claude-sonnet-4-20250514" # Powerful
simple_threshold = 100
complex_threshold = 500
Air-gapped with tiered local models
[routing]
simple_model = "ollama/llama3.1:8b"
medium_model = "ollama/llama3.1:70b"
complex_model = "ollama/llama3.1:70b"
simple_threshold = 100
complex_threshold = 500
Provider fallback
Complexity routing also serves as a fallback mechanism — if one provider's circuit breaker is open, the platform searches for alternatives.
Circuit breaker interaction
When a provider's circuit breaker is open:
- The platform searches for an alternative provider offering the same model
- If found, routes to the alternative
- If not found, returns an error
Tuning thresholds
- Lower
simple_threshold→ more requests use the fast model (cheaper, but less capable) - Higher
complex_threshold→ fewer requests reach the expensive model - Set both to 0 to always use the complex model (no routing)
Next steps
Related: