Complexity Routing

What you'll accomplish: Route requests to different model tiers based on complexity, balancing cost and quality.

How it works

When [routing] is configured, requests are scored and routed to different model tiers:

[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

Scoring

The complexity score is computed from:

  • Message length (character count)
  • Number of tools available
  • Requested max_tokens
ScoreTierRouted to
< simple_thresholdSimplesimple_model
≥ simple_threshold and < complex_thresholdMediummedium_model
≥ complex_thresholdComplexcomplex_model

Practical examples

Cost optimization

Use fast, cheap models for simple queries and expensive models for complex reasoning:

[routing]
simple_model = "groq/llama-3.1-8b-instant"      # Fast, cheap
medium_model = "anthropic/claude-sonnet-4-20250514"  # Balanced
complex_model = "anthropic/claude-sonnet-4-20250514"          # Powerful
simple_threshold = 100
complex_threshold = 500

Air-gapped with tiered local models

[routing]
simple_model = "ollama/llama3.1:8b"
medium_model = "ollama/llama3.1:70b"
complex_model = "ollama/llama3.1:70b"
simple_threshold = 100
complex_threshold = 500

Provider fallback

Complexity routing also serves as a fallback mechanism — if one provider's circuit breaker is open, the platform searches for alternatives.

Circuit breaker interaction

When a provider's circuit breaker is open:

  1. The platform searches for an alternative provider offering the same model
  2. If found, routes to the alternative
  3. If not found, returns an error

Tuning thresholds

  • Lower simple_threshold → more requests use the fast model (cheaper, but less capable)
  • Higher complex_threshold → fewer requests reach the expensive model
  • Set both to 0 to always use the complex model (no routing)

Next steps


Related: