Agent Execution Model
How a message flows from entry point to LLM response, including the tool-use loop, memory recall, session management, and error recovery.
Message Flow Overview
Pre-Execution Validation
Before the agent loop begins, several checks are enforced:
- Agent state — must be
Running. Any other state returnsInvalidStateerror. - Message length — maximum 128,000 characters. Exceeding returns
SizeLimitExceeded. - Tool allowlist resolution — the equipped competency's required skills are resolved to concrete tool names at execution time (not just at equip time). This ensures changes to skills are reflected immediately.
- Model resolution — the platform resolves which provider and model to use:
- Apply defaults if agent has empty provider/model_id
- Resolve via model catalog (alias expansion)
- Apply complexity routing (optional)
- Check circuit breaker state
- Find alternative provider if primary is tripped
- Update kernel manifest with resolved model/provider
The Agent Loop
The core execution engine runs an iterative loop that alternates between LLM calls and tool execution:
Key Parameters
| Parameter | Value | Description |
|---|---|---|
MAX_ITERATIONS | 50 | Maximum LLM call iterations per invocation |
MAX_RETRIES | 3 | Retries for rate-limited/overloaded API calls |
BASE_RETRY_DELAY_MS | 1000 | Base for exponential backoff |
TOOL_TIMEOUT_SECS | 120 | Per-tool execution timeout |
AGENT_TOOL_TIMEOUT_SECS | 600 | Timeout for inter-agent tools (agent_send, agent_spawn) |
MAX_CONTINUATIONS | 5 | MaxTokens continuations before returning partial |
DEFAULT_CONTEXT_WINDOW | 200,000 | Token budget for context management |
Environment Overrides
| Variable | Purpose |
|---|---|
HOZIRON_TOOL_TIMEOUT_SECS | Override tool timeout (0 = disable) |
HOZIRON_AGENT_TOOL_TIMEOUT_SECS | Override agent delegation timeout (0 = disable) |
Note: These environment variables control the execution kernel's timeout behavior. They are part of the Hoziron platform configuration namespace.
Memory Recall
Before the first LLM call, the loop recalls relevant memories using the user's message as a query:
- Vector similarity (preferred) — if an embedding driver is configured, the message is embedded and used for approximate nearest-neighbor search against the agent's memory store
- Text search (fallback) — if embedding fails or isn't configured, falls back to text-based recall
Up to 5 memory fragments are retrieved and injected into the system prompt as a structured section.
LLM Call with Retry
Each LLM call includes exponential backoff retry logic:
The retry logic handles:
- Rate limiting (429) — backs off exponentially
- Overloaded (529) — same backoff strategy
- Fallback models — if configured, tries alternative models before failing
Tool Execution
When the LLM responds with ToolUse, each tool call is processed:
-
Loop guard check — prevents infinite loops (same tool + same input repeated)
Allow— proceed normallyWarn— proceed but append warning to resultBlock— reject this call, return error to LLMCircuitBreak— abort the entire agent loop
-
BeforeToolCall hook — plugin hook can block execution
-
Capability enforcement — tool must be in the agent's
tool_allowlist -
Timeout-wrapped execution — each tool runs with a configurable timeout:
- Regular tools: 120 seconds
- Inter-agent tools (
agent_send,agent_spawn): 600 seconds - Timeout = 0 disables the limit (for slow local inference)
-
AfterToolCall hook — observability hook fires post-execution
-
Result truncation — large tool outputs are dynamically truncated based on remaining context budget
Tool Error Handling
After all tool calls complete:
- If any returned errors, guidance is injected telling the LLM not to fabricate results
- If approval was denied, guidance tells the LLM not to retry denied tools
- The LLM sees both successful results and error messages in its next turn
Phantom Action Detection
A safety mechanism detects when the LLM claims to have performed an action (sent a message, posted to a channel) without actually calling any tools. When detected:
- The claim is captured
- A re-prompt is injected: "You claimed to perform an action but did not call any tools..."
- The loop continues, forcing the LLM to use actual tools
This prevents hallucinated completions where the agent tells the user "Message sent!" without actually sending anything.
Session Management
History Trimming
Before each LLM call, the message history is checked:
- Default maximum: configurable per agent (overrides global default)
- When exceeded: oldest messages are drained
- After trimming: history is validated for tool_use/tool_result pairing
Context Overflow Recovery
A multi-stage pipeline handles context overflow:
- Guard — compact oversized tool results before LLM call
- Recovery Stage 1 — trim old messages
- Recovery Stage 2 — more aggressive trimming
- Final Error — suggest
/resetor/compact
Session Persistence
After the loop completes:
- Final assistant message is saved to session (preserving Thinking blocks for reasoning models)
- Heartbeat turns are pruned (saves context budget)
- The interaction is remembered in the memory substrate (with embedding if available)
Silent Completions
Agents can intentionally choose not to respond by outputting NO_REPLY or [SILENT]. When detected:
- An internal marker
[no reply needed]is stored in history - The response is returned as empty string with
silent: true - Channel adapters suppress message delivery
Resource Tracking
The scheduler tracks per-agent resource usage:
- Tokens: rolling 1-hour window, checked against
max_llm_tokens_per_hour - Tool calls: tracked per minute
- Cost: estimated from token usage × model pricing
When a quota is exceeded, the agent's next invocation is rejected with QuotaExceeded.