Provider Routing

How requests are routed to LLM providers — model resolution, complexity-based routing, circuit breakers, and fallback chains.

Model Resolution Pipeline

When an agent needs to call an LLM, the platform resolves which provider and model to use through a multi-step pipeline:

Default Model Substitution

When an agent has empty provider or model_id fields:

# Platform config (config.toml)
[default_model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"

The platform fills in the blanks at execution time. This allows agents to be created without specifying a model — they inherit the platform default.

Complexity-Based Routing

When [routing] is configured, requests are scored and routed to different model tiers:

[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

Scoring

The complexity score is computed from:

Message length (character count)
Number of tools available
Requested max_tokens

Score	Tier	Routed To
`< simple_threshold`	Simple	`simple_model`
`≥ simple_threshold` and `< complex_threshold`	Medium	`medium_model`
`≥ complex_threshold`	Complex	`complex_model`

Alias Resolution

Model identifiers can be aliases (e.g., sonnet → anthropic/claude-sonnet-4-20250514). The model catalog resolves aliases before routing.

Circuit Breaker

Per-provider circuit breakers protect against cascading failures:

Configuration

[health]
failure_threshold = 5           # Consecutive failures to trip
recovery_cooldown_secs = 60     # Seconds before HalfOpen probe

Behavior by State

State	Effect
Closed	Requests flow normally
Open	Requests rejected immediately (no provider contact)
HalfOpen	Single probe request allowed through

Recording Outcomes

After every agent execution:

Success → record_success(provider_id) — resets failure count; HalfOpen → Closed
Failure → record_failure(provider_id) — increments counter; may trip to Open

Fallback Provider Resolution

When a provider's circuit breaker is open, the platform searches for an alternative:

The lookup is entirely in-memory (catalog query, no network calls).

Provider Authentication

Providers authenticate via environment variables resolved lazily at request time:

[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"    # Name of env var (not the key itself)
enabled = true

Auth Status Detection

Env Var State	Auth Status	Effect
Set, non-empty	`Configured`	Provider is usable
Unset or whitespace	`Missing`	Provider excluded from `available_models()`
N/A (no `api_key_env`)	`NotRequired`	Provider always usable (e.g., local Ollama)

Lazy Resolution

The key is checked from the environment at call time, not at startup. This means:

Adding an env var while the daemon is running makes the provider available immediately
Removing one makes it unavailable on the next request
No daemon restart needed for key changes

Provider Registry Structure

Key Operations

Operation	Description
`list_providers()`	Returns all registered providers with current auth status
`available_models()`	Returns models from providers with Configured/NotRequired status
`is_available(provider_id)`	Checks registered + not disabled
`create_driver_for(provider, model)`	Resolves API key + base URL for driver instantiation

Local Provider Detection

Providers with base_url pointing to localhost or private IPs (127.0.0.1, 192.168.x.x, 10.x.x.x) are detected as local. For local providers:

API key requirement is relaxed (many local inference servers accept any non-empty string)
Circuit breaker behavior is the same as remote providers

LLM Call Retry (Inside Agent Loop)

Once the model is resolved and the agent loop begins, individual LLM calls have their own retry mechanism:

Fallback Model Chain

Agents can declare a fallback model list in their manifest. When the primary model fails (after retries), the next fallback is tried:

Primary: claude-sonnet-4-20250514 (3 retries)
  ↓ fails
Fallback 1: gpt-4o (3 retries)
  ↓ fails
Fallback 2: llama-3.1-70b (3 retries)
  ↓ fails
Error returned to caller