Provider Routing

How requests are routed to LLM providers — model resolution, complexity-based routing, circuit breakers, and fallback chains.

Model Resolution Pipeline

When an agent needs to call an LLM, the platform resolves which provider and model to use through a multi-step pipeline:

Default Model Substitution

When an agent has empty provider or model_id fields:

# Platform config (config.toml)
[default_model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"

The platform fills in the blanks at execution time. This allows agents to be created without specifying a model — they inherit the platform default.

Complexity-Based Routing

When [routing] is configured, requests are scored and routed to different model tiers:

[routing]
simple_model = "groq/llama-3.1-8b-instant"
medium_model = "anthropic/claude-sonnet-4-20250514"
complex_model = "anthropic/claude-sonnet-4-20250514"
simple_threshold = 100
complex_threshold = 500

Scoring

The complexity score is computed from:

  • Message length (character count)
  • Number of tools available
  • Requested max_tokens
ScoreTierRouted To
< simple_thresholdSimplesimple_model
≥ simple_threshold and < complex_thresholdMediummedium_model
≥ complex_thresholdComplexcomplex_model

Alias Resolution

Model identifiers can be aliases (e.g., sonnetanthropic/claude-sonnet-4-20250514). The model catalog resolves aliases before routing.

Circuit Breaker

Per-provider circuit breakers protect against cascading failures:

Configuration

[health]
failure_threshold = 5           # Consecutive failures to trip
recovery_cooldown_secs = 60     # Seconds before HalfOpen probe

Behavior by State

StateEffect
ClosedRequests flow normally
OpenRequests rejected immediately (no provider contact)
HalfOpenSingle probe request allowed through

Recording Outcomes

After every agent execution:

  • Successrecord_success(provider_id) — resets failure count; HalfOpen → Closed
  • Failurerecord_failure(provider_id) — increments counter; may trip to Open

Fallback Provider Resolution

When a provider's circuit breaker is open, the platform searches for an alternative:

The lookup is entirely in-memory (catalog query, no network calls).

Provider Authentication

Providers authenticate via environment variables resolved lazily at request time:

[providers.anthropic]
api_key_env = "ANTHROPIC_API_KEY"    # Name of env var (not the key itself)
enabled = true

Auth Status Detection

Env Var StateAuth StatusEffect
Set, non-emptyConfiguredProvider is usable
Unset or whitespaceMissingProvider excluded from available_models()
N/A (no api_key_env)NotRequiredProvider always usable (e.g., local Ollama)

Lazy Resolution

The key is checked from the environment at call time, not at startup. This means:

  • Adding an env var while the daemon is running makes the provider available immediately
  • Removing one makes it unavailable on the next request
  • No daemon restart needed for key changes

Provider Registry Structure

Key Operations

OperationDescription
list_providers()Returns all registered providers with current auth status
available_models()Returns models from providers with Configured/NotRequired status
is_available(provider_id)Checks registered + not disabled
create_driver_for(provider, model)Resolves API key + base URL for driver instantiation

Local Provider Detection

Providers with base_url pointing to localhost or private IPs (127.0.0.1, 192.168.x.x, 10.x.x.x) are detected as local. For local providers:

  • API key requirement is relaxed (many local inference servers accept any non-empty string)
  • Circuit breaker behavior is the same as remote providers

LLM Call Retry (Inside Agent Loop)

Once the model is resolved and the agent loop begins, individual LLM calls have their own retry mechanism:

Fallback Model Chain

Agents can declare a fallback model list in their manifest. When the primary model fails (after retries), the next fallback is tried:

Primary: claude-sonnet-4-20250514 (3 retries)
  ↓ fails
Fallback 1: gpt-4o (3 retries)
  ↓ fails
Fallback 2: llama-3.1-70b (3 retries)
  ↓ fails
Error returned to caller