Complexity Routing

What you'll accomplish: Route requests to different model tiers based on complexity, balancing cost and quality.

How it works

When [routing] is configured, requests are scored and routed to different model tiers:

[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

Scoring

The complexity score is computed from:

Message length (character count)
Number of tools available
Requested max_tokens

Score	Tier	Routed to
`< simple_threshold`	Simple	`simple_model`
`≥ simple_threshold` and `< complex_threshold`	Medium	`medium_model`
`≥ complex_threshold`	Complex	`complex_model`

Practical examples

Cost optimization

Use fast, cheap models for simple queries and expensive models for complex reasoning:

[routing]
simple_model = "groq/llama-3.1-8b-instant"      # Fast, cheap
medium_model = "anthropic/claude-sonnet-4-20250514"  # Balanced
complex_model = "anthropic/claude-sonnet-4-20250514"          # Powerful
simple_threshold = 100
complex_threshold = 500

Air-gapped with tiered local models

[routing]
simple_model = "ollama/llama3.1:8b"
medium_model = "ollama/llama3.1:70b"
complex_model = "ollama/llama3.1:70b"
simple_threshold = 100
complex_threshold = 500

Provider fallback

Complexity routing also serves as a fallback mechanism — if one provider's circuit breaker is open, the platform searches for alternatives.

Circuit breaker interaction

When a provider's circuit breaker is open:

The platform searches for an alternative provider offering the same model
If found, routes to the alternative
If not found, returns an error

Tuning thresholds

Lower simple_threshold → more requests use the fast model (cheaper, but less capable)
Higher complex_threshold → fewer requests reach the expensive model
Set both to 0 to always use the complex model (no routing)

Next steps

Related: