PII & Data Protection

How Hoziron protects personally identifiable information — detection, tokenization, memory isolation, audit trails, and data destruction.

Data Protection Architecture

PII Pipeline

Each agent has its own PII pipeline instance composed of three traits:

Detection

PiiDetector.detect(input, trust_policy) → Vec<PiiSpan>

Scans input text for PII spans and returns their byte offsets and categories:

[
  {"start": 45, "end": 56, "category": "ssn"},
  {"start": 120, "end": 135, "category": "policy_number"},
  {"start": 200, "end": 215, "category": "full_name"}
]

Tokenization

PiiTokenizer.tokenize(input, spans, trust_policy) → (tokenized_text, token_map)

Replaces detected PII with opaque tokens:

  • Input: "Customer John Smith (SSN: 123-45-6789) filed claim"
  • Output: "Customer [PII_NAME_001] (SSN: [PII_SSN_001]) filed claim"
  • TokenMap: {"PII_NAME_001": "John Smith", "PII_SSN_001": "123-45-6789"}

Hydration

PiiHydrator.hydrate(tokenized_text, token_map, trust_policy) → original_text

Restores original PII values from tokens. Used only for trusted destinations.

Pipeline Composition

Trust Policies

Each agent's PII behavior is governed by a TrustPolicy:

{
  "enabled": true,
  "detectors": ["ssn", "email", "full_name", "policy_number", "phone"],
  "trusted_destinations": ["claims-system-api", "internal-database"]
}
FieldPurpose
enabledMaster switch for PII processing on this agent
detectorsWhich PII categories to scan for
trusted_destinationsDestinations that receive hydrated (real) data

Current Implementation Status

Phase 1 (current): No-op pipeline — data passes through unchanged. The architecture and trait boundaries are in place, but detection is not yet active.

Phase 2 (planned): Real PII detection engine with configurable rules per industry vertical (insurance-specific patterns: policy numbers, claim numbers, NPI, etc.).

PII at Workflow Boundaries

Data flowing between agents in a workflow passes through the target agent's PII pipeline:

This ensures:

  • Each agent only sees PII appropriate for its trust level
  • An agent processing claims data doesn't expose raw SSNs to a downstream reporting agent
  • The token map is held by the engine (not the agents) — agents cannot reverse-tokenize

Memory Isolation

Per-Agent Scoping

Every memory operation validates the caller's identity:

Cross-Agent Access Denied

The ScopedMemory wrapper is the enforcement point. Even if an agent somehow obtains another agent's ID (from a workflow context, a tool result, etc.), the memory layer denies access:

{
  "error": {
    "category": "MemoryViolation",
    "message": "Cross-agent memory access denied",
    "details": {"caller": "agent-a-id", "scope_owner": "agent-b-id"}
  }
}

Agent Memory Destruction

When an agent is deleted (delete_agent), all its data is irrecoverably destroyed:

  1. Semantic memory fragments — loop-forget all recalled fragments until empty
  2. KV scope — delete all keys in the agent's namespace
  3. Knowledge graph entities — query and remove all entities sourced from this agent
  4. Session history — removed by the kernel's registry removal

This satisfies the 5-second destruction guarantee.

Audit Trail

Merkle Hash Chain

Every audit entry includes a SHA-256 hash that chains to the previous entry:

hash = SHA-256(prev_hash | timestamp | identity | role | action | target | result)

If any entry is modified after the fact, all subsequent hashes become invalid — making tampering detectable.

What Gets Audited

Every authenticated API request is recorded:

FieldContent
timestampISO 8601 when the action occurred
identityCaller's key name or OIDC subject
roleCaller's role at time of request
actionHTTP method + path (e.g., POST /agents)
targetResource acted upon (agent ID, key ID, etc.)
resultOutcome: success, forbidden, error
hashSHA-256 chain hash

Verification

The verify_chain() operation streams entries and recomputes each hash:

For each entry (oldest → newest):
    expected = SHA-256(prev_hash | entry.fields...)
    if expected != entry.hash: CHAIN BROKEN at entry N

Verification is O(n) time, O(1) memory (streaming).

Pruning with Integrity

When entries exceed max_entries (default 100,000), oldest entries are pruned. To maintain chain verifiability:

  1. The hash of the last-pruned entry is saved as a "checkpoint" in metadata
  2. Future verification anchors from this checkpoint instead of genesis
  3. Entries before the checkpoint are gone but the chain after it remains verifiable

Configuration

[audit]
enabled = true
max_entries = 100000

Storage

  • SQLite with WAL mode (concurrent reads without blocking writes)
  • Indexed on timestamp and identity for efficient queries
  • Audit writes are non-blocking — spawned as background tasks to avoid adding latency to API responses

Secret Handling Guarantees

GuaranteeHow It's Enforced
API keys never stored in plaintextOnly argon2id hashes in the database
Provider keys never in config filesapi_key_env references env var name, not value
Keys never in error messagesError reports env var name only
Keys never in logsNo logging of resolved key values
Key store file permissions0600 on Unix (owner read/write only)
Key shown only oncesecret field returned only at creation time
Constant-time validationAll keys scanned regardless of match position
Memory wiped on deleteAll agent data destroyed across all layers

Data Sovereignty

For regulated industries (insurance, healthcare, finance):

ControlMechanism
Data stays on-premiseSelf-hosted deployment, local models (Ollama/vLLM)
No data sent to cloudAir-gapped mode with enabled = false on cloud providers
Per-agent PII policyTrust policies control what each agent can see
Cross-agent isolationScopedMemory prevents any data leakage between agents
Audit trailComplete, tamper-evident record of all API operations
Data deletionIrrecoverable destruction on agent delete
Network controlIP allowlist + TLS + CORS restrict who connects