Agent Execution Model

How a message flows from entry point to LLM response, including the tool-use loop, memory recall, session management, and error recovery.

Message Flow Overview

Pre-Execution Validation

Before the agent loop begins, several checks are enforced:

  1. Agent state — must be Running. Any other state returns InvalidState error.
  2. Message length — maximum 128,000 characters. Exceeding returns SizeLimitExceeded.
  3. Tool allowlist resolution — the equipped competency's required skills are resolved to concrete tool names at execution time (not just at equip time). This ensures changes to skills are reflected immediately.
  4. Model resolution — the platform resolves which provider and model to use:
    • Apply defaults if agent has empty provider/model_id
    • Resolve via model catalog (alias expansion)
    • Apply complexity routing (optional)
    • Check circuit breaker state
    • Find alternative provider if primary is tripped
    • Update kernel manifest with resolved model/provider

The Agent Loop

The core execution engine runs an iterative loop that alternates between LLM calls and tool execution:

Key Parameters

ParameterValueDescription
MAX_ITERATIONS50Maximum LLM call iterations per invocation
MAX_RETRIES3Retries for rate-limited/overloaded API calls
BASE_RETRY_DELAY_MS1000Base for exponential backoff
TOOL_TIMEOUT_SECS120Per-tool execution timeout
AGENT_TOOL_TIMEOUT_SECS600Timeout for inter-agent tools (agent_send, agent_spawn)
MAX_CONTINUATIONS5MaxTokens continuations before returning partial
DEFAULT_CONTEXT_WINDOW200,000Token budget for context management

Environment Overrides

VariablePurpose
HOZIRON_TOOL_TIMEOUT_SECSOverride tool timeout (0 = disable)
HOZIRON_AGENT_TOOL_TIMEOUT_SECSOverride agent delegation timeout (0 = disable)

Note: These environment variables control the execution kernel's timeout behavior. They are part of the Hoziron platform configuration namespace.

Memory Recall

Before the first LLM call, the loop recalls relevant memories using the user's message as a query:

  1. Vector similarity (preferred) — if an embedding driver is configured, the message is embedded and used for approximate nearest-neighbor search against the agent's memory store
  2. Text search (fallback) — if embedding fails or isn't configured, falls back to text-based recall

Up to 5 memory fragments are retrieved and injected into the system prompt as a structured section.

LLM Call with Retry

Each LLM call includes exponential backoff retry logic:

The retry logic handles:

  • Rate limiting (429) — backs off exponentially
  • Overloaded (529) — same backoff strategy
  • Fallback models — if configured, tries alternative models before failing

Tool Execution

When the LLM responds with ToolUse, each tool call is processed:

  1. Loop guard check — prevents infinite loops (same tool + same input repeated)

    • Allow — proceed normally
    • Warn — proceed but append warning to result
    • Block — reject this call, return error to LLM
    • CircuitBreak — abort the entire agent loop
  2. BeforeToolCall hook — plugin hook can block execution

  3. Capability enforcement — tool must be in the agent's tool_allowlist

  4. Timeout-wrapped execution — each tool runs with a configurable timeout:

    • Regular tools: 120 seconds
    • Inter-agent tools (agent_send, agent_spawn): 600 seconds
    • Timeout = 0 disables the limit (for slow local inference)
  5. AfterToolCall hook — observability hook fires post-execution

  6. Result truncation — large tool outputs are dynamically truncated based on remaining context budget

Tool Error Handling

After all tool calls complete:

  • If any returned errors, guidance is injected telling the LLM not to fabricate results
  • If approval was denied, guidance tells the LLM not to retry denied tools
  • The LLM sees both successful results and error messages in its next turn

Phantom Action Detection

A safety mechanism detects when the LLM claims to have performed an action (sent a message, posted to a channel) without actually calling any tools. When detected:

  1. The claim is captured
  2. A re-prompt is injected: "You claimed to perform an action but did not call any tools..."
  3. The loop continues, forcing the LLM to use actual tools

This prevents hallucinated completions where the agent tells the user "Message sent!" without actually sending anything.

Session Management

History Trimming

Before each LLM call, the message history is checked:

  • Default maximum: configurable per agent (overrides global default)
  • When exceeded: oldest messages are drained
  • After trimming: history is validated for tool_use/tool_result pairing

Context Overflow Recovery

A multi-stage pipeline handles context overflow:

  1. Guard — compact oversized tool results before LLM call
  2. Recovery Stage 1 — trim old messages
  3. Recovery Stage 2 — more aggressive trimming
  4. Final Error — suggest /reset or /compact

Session Persistence

After the loop completes:

  1. Final assistant message is saved to session (preserving Thinking blocks for reasoning models)
  2. Heartbeat turns are pruned (saves context budget)
  3. The interaction is remembered in the memory substrate (with embedding if available)

Silent Completions

Agents can intentionally choose not to respond by outputting NO_REPLY or [SILENT]. When detected:

  • An internal marker [no reply needed] is stored in history
  • The response is returned as empty string with silent: true
  • Channel adapters suppress message delivery

Resource Tracking

The scheduler tracks per-agent resource usage:

  • Tokens: rolling 1-hour window, checked against max_llm_tokens_per_hour
  • Tool calls: tracked per minute
  • Cost: estimated from token usage × model pricing

When a quota is exceeded, the agent's next invocation is rejected with QuotaExceeded.