Provider Routing
How requests are routed to LLM providers — model resolution, complexity-based routing, circuit breakers, and fallback chains.
Model Resolution Pipeline
When an agent needs to call an LLM, the platform resolves which provider and model to use through a multi-step pipeline:
Default Model Substitution
When an agent has empty provider or model_id fields:
# Platform config (config.toml)
[default_model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
The platform fills in the blanks at execution time. This allows agents to be created without specifying a model — they inherit the platform default.
Complexity-Based Routing
When [routing] is configured, requests are scored and routed to different model tiers:
[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500
Scoring
The complexity score is computed from:
- Message length (character count)
- Number of tools available
- Requested max_tokens
| Score | Tier | Routed To |
|---|---|---|
< simple_threshold | Simple | simple_model |
≥ simple_threshold and < complex_threshold | Medium | medium_model |
≥ complex_threshold | Complex | complex_model |
Alias Resolution
Model identifiers can be aliases (e.g., sonnet → anthropic/claude-sonnet-4-20250514). The model catalog resolves aliases before routing.
Circuit Breaker
Per-provider circuit breakers protect against cascading failures:
Configuration
[health]
failure_threshold = 5 # Consecutive failures to trip
recovery_cooldown_secs = 60 # Seconds before HalfOpen probe
Behavior by State
| State | Effect |
|---|---|
| Closed | Requests flow normally |
| Open | Requests rejected immediately (no provider contact) |
| HalfOpen | Single probe request allowed through |
Recording Outcomes
After every agent execution:
- Success →
record_success(provider_id)— resets failure count; HalfOpen → Closed - Failure →
record_failure(provider_id)— increments counter; may trip to Open
Fallback Provider Resolution
When a provider's circuit breaker is open, the platform searches for an alternative:
The lookup is entirely in-memory (catalog query, no network calls).
Provider Authentication
Providers authenticate via environment variables resolved lazily at request time:
[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY" # Name of env var (not the key itself)
enabled = true
Auth Status Detection
| Env Var State | Auth Status | Effect |
|---|---|---|
| Set, non-empty | Configured | Provider is usable |
| Unset or whitespace | Missing | Provider excluded from available_models() |
N/A (no api_key_env) | NotRequired | Provider always usable (e.g., local Ollama) |
Lazy Resolution
The key is checked from the environment at call time, not at startup. This means:
- Adding an env var while the daemon is running makes the provider available immediately
- Removing one makes it unavailable on the next request
- No daemon restart needed for key changes
Provider Registry Structure
Key Operations
| Operation | Description |
|---|---|
list_providers() | Returns all registered providers with current auth status |
available_models() | Returns models from providers with Configured/NotRequired status |
is_available(provider_id) | Checks registered + not disabled |
create_driver_for(provider, model) | Resolves API key + base URL for driver instantiation |
Local Provider Detection
Providers with base_url pointing to localhost or private IPs (127.0.0.1, 192.168.x.x, 10.x.x.x) are detected as local. For local providers:
- API key requirement is relaxed (many local inference servers accept any non-empty string)
- Circuit breaker behavior is the same as remote providers
LLM Call Retry (Inside Agent Loop)
Once the model is resolved and the agent loop begins, individual LLM calls have their own retry mechanism:
Fallback Model Chain
Agents can declare a fallback model list in their manifest. When the primary model fails (after retries), the next fallback is tried:
Primary: claude-sonnet-4-20250514 (3 retries)
↓ fails
Fallback 1: gpt-4o (3 retries)
↓ fails
Fallback 2: llama-3.1-70b (3 retries)
↓ fails
Error returned to caller