`/a1` — Agent System (Phase 2)

The /a1/agents/... surface is a rig-core powered agent system on top of /v1/chat/completions. Agents are configured via YAML files, can use stack-side tools, retrieve from per-tenant RAG indices, and persist multi-turn sessions.

Architecture (ADR 0011)

Caller → POST /a1/agents/<name>/chat
       │   auth-middleware re-resolves Identity
       │   build rig::Agent from agents/<name>.yaml
       │   rig sub-call …
       └─▶ POST http://localhost:3000/v1/chat/completions  (self-call)
              ├─ SPASS-Augment: server-tools=off, system-prompt=off, memory=off
              ├─ SPASS-Caller-Depth: <incremented>
              ├─ SPASS-Parent-Request-Id: <outer /a1 request-id>
              └─ Bearer: <forwarded caller-bearer>
                  → full /v1 pipeline (Cost-V2, audit, Layer-1, leak-checks)

Path Y = "rig speaks to our own /v1" — every Stack-invariant (Cost-Pipeline V2, Audit, Layer-1 defense, ADR-0009 leak-sanitization, SPASS-Augment) applies transparently to every rig sub-call.

Endpoints

Method	Path	Description
GET	`/a1/agents`	List all configured agents (operator-authored YAML)
GET	`/a1/agents/<name>`	Full agent config (system_prompt, model, tools, rag_index)
POST	`/a1/agents/<name>/chat`	Single-shot prompt → completion
POST	`/a1/agents/<name>/sessions`	Create persistent multi-turn session
GET	`/a1/agents/<name>/sessions`	List the caller's sessions for that agent
GET	`/a1/agents/<name>/sessions/<sid>`	Session metadata + msg-count
DELETE	`/a1/agents/<name>/sessions/<sid>`	Drop session + all messages
GET	`/a1/agents/<name>/sessions/<sid>/messages`	Full conversation history
POST	`/a1/agents/<name>/sessions/<sid>/messages`	Append turn (uses persisted history); 308 → successor if archived
POST	`/a1/agents/<name>/sessions/<sid>/compact`	Compact older turns into a summary, archive source, create successor (Cut 2.12)
GET	`/a1/agents/<name>/sessions/<sid>/lineage`	Forward + backward lineage chain + summary records (Cut 2.12)

Agent YAML config

Operator-authored, lives in data/agents/*.yaml on the host. Loaded once at process boot — restart needed for config changes (hot-reload is on the backlog).

name: berlin-rag                 # URL-safe [a-zA-Z0-9_-]+
model: claude-opus-4.7           # must be in ALLOWED_MODELS + per-token allowlist
description: Berlin RAG persona  # shown in /a1/agents listing
system_prompt: |
  Du beantwortest Fragen zu Berlin auf Deutsch...
max_tokens: 512                  # optional
temperature: 0.3                 # optional
tools:                           # optional — stack tools available to agent
  - calculator
  - current_datetime
  - rag_query                    # active-RAG tool, see /docs/rag
rag_index: berlin-info           # optional — passive RAG context-injection
rag_top_k: 3                     # optional, default 5

Bundled agents (out-of-the-box examples)

Agent	Model	Tools	RAG	Use-case
`concise-de`	llama-4-scout (local)	—	—	Bare LLM, terse German
`berlin-tour`	llama-4-scout (local)	—	—	Berlin-tour persona
`berlin-rag`	llama-4-scout (local)	—	berlin-info	Passive RAG (rig dynamic_context)
`researcher`	claude-opus-4.7	calculator, current_datetime, wikipedia_search, wikipedia_summary	—	Tool-using research assistant
`researcher-rag`	claude-opus-4.7	calculator, current_datetime, wikipedia_search	matrix-rag-idx	Tools + RAG combined
`librarian`	claude-opus-4.7	rag_query	—	Active-RAG via tool-call

Tool-calling note: local Llama-4-Scout-FP4 (--tool-call-parser=pythonic) currently emits tool-calls as plain-text JSON, which our /v1 tool-loop cannot parse. Agents that need tools should pin model: claude-opus-4.7 for now. See backlog for llama4_json parser switch.

Multi-turn sessions

Sessions persist in a tenant-scoped SQLite at data/sqlite/agent_sessions/<tenant>.sqlite. Each session is agent-pinned at creation — switching the agent mid-session returns 400 session_agent_mismatch.

Sessions are user-scoped (Cut 2.23c, ADR 0016). All session-endpoints require SPASS-User-Id header — different end-users sharing the same bearer cannot see or mutate each other's sessions. Pre-Cut-2.23c sessions (DEFAULT user_id '') are effectively invisible to all end-users; operator can read them via SQL.

Per-session compaction config (Cut 2.12 + 2.13) — optional body fields on create. Missing fields fall through to the per-tenant cascade default (process → yaml → DB), so an operator can change the tenant-wide default without touching every session-create call.

Field	Type	Default source	Notes
`compact_strategy`	`auto`\|`manual`\|`off`	tenant cascade (`auto`)	`auto` enables the auto-trigger gate (see below); `manual` keeps only the explicit `POST /compact`; `off` disables compaction entirely
`compact_keep_last_n`	`0..200`	tenant cascade (`10`)	live messages preserved verbatim when compaction runs; everything older becomes the summary
`compact_observation_mask`	bool	tenant cascade (`true`)	`true` instructs the summary-model to drop tool-noise / acknowledgements; `false` passes raw turns

TOKEN="$(grep '^RUST_API_BEARER=' /home/dietmar/dgx-llm/.env | cut -d= -f2)"

SID=$(curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions \
  -H "Authorization: Bearer $TOKEN" | jq -r .id)

curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions/$SID/messages \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"message": "Mein Lieblingssport ist Tennis."}' | jq

curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions/$SID/messages \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"message": "Welcher Sport ist mein Liebling?"}' | jq
# → "Dein Lieblingssport ist Tennis."

Compaction (Cut 2.12) — Z-chained lineage + C4-hybrid

When a session grows past the model's effective context-budget, or the operator triggers POST /a1/agents/<n>/sessions/<sid>/compact manually:

The most-recent compact_keep_last_n (default 10) messages are preserved verbatim.
Everything older is summarised by the per-tenant effective compact_summary_model — pulled from the 3-level cascade process → yaml → DB (default llama-4-scout for Datenschutz; tenant can override via tokens.yaml::tenants[].defaults.compact_summary_model or runtime PUT /v1/tenant/config). See Per-tenant config and ADR 0013.
The summarised rows are soft-deleted (compacted_at = NOW); a session_summaries row is persisted.
A new successor session is created with parent_session_id → source.
The summary is bootstrapped into the successor as a single synthetic assistant-role message ("[compaction summary from session X] …"); the kept-last-N messages are copied verbatim.
The source session is archived_at = NOW, successor_session_id = new.
Subsequent chat-attempts on the archived session id return HTTP 308 with Location: /a1/agents/<n>/sessions/<successor>/messages — standard HTTP-clients follow this transparently and the chat lands on the successor with the body intact.

# Compact a session manually
curl -s -X POST "http://localhost:3000/a1/agents/$NAME/sessions/$SID/compact" \
  -H "Authorization: Bearer $TOKEN" | jq
# → { source_session_id, successor_session_id, summary_id, summary_text, ... }

# Inspect the lineage chain
curl -s "http://localhost:3000/a1/agents/$NAME/sessions/$SID/lineage" \
  -H "Authorization: Bearer $TOKEN" | jq
# → { backward: [...], forward: [...], summaries: [...] }

observation_mask controls the summary-model's instruction template. With true (default), the model is told to drop greetings, acknowledgements, raw tool-output, and error-traces while preserving named entities, numbers, decisions, and unresolved questions. With false, raw turns are passed verbatim — useful when tool-output is itself the load-bearing context (e.g. RAG-retrieval pipelines).

Failure modes:

409 session_compact_conflict — fewer than keep_last_n live messages, or session already archived. Lineage view (GET .../lineage) shows the live successor.
The summary-model self-call goes through our own /v1/chat/completions so audit + rate-limit + cost-pipeline all apply uniformly. Cost is attributed to the caller's tenant via the bearer-forward pattern.

Auto-trigger

When compact_strategy: auto (the default), POST .../sessions/<sid>/messages checks BEFORE the chat-loop runs:

Condition	Source
`compact_strategy == "auto"`	session row
live-message count > `compact_keep_last_n`	`messages` table where `compacted_at IS NULL`
estimated tokens > 80 % of model `context_window`	char-based heuristic (~4 chars/token) over live-messages × catalog-lookup of `model`

If all three hold, the handler compacts the session FIRST, then runs the chat against the brand-new successor. The response carries:

SPASS-Session-Compacted: <new-sid> — successor id; persist this for the next turn (the original sid will return 308 from now on).
SPASS-Session-Compacted-Tokens: <n> — pre-compact token-estimate that triggered the compaction; useful for tuning compact_keep_last_n if compactions are firing too often or too rarely.

Auto-trigger failures are logged but do NOT bubble up — the chat continues against the original (now-uncompacted) session and the upstream may surface its own context-overflow error if it really busts. This is intentional: a transient summary-model issue should not break user-facing chat. Switch to compact_strategy: manual to disable the auto-trigger entirely while keeping the explicit POST /compact endpoint available.

Token-estimate caveat: the gate uses a conservative chars / 4 heuristic, not real tokens. It tends to over-count for English-heavy prose (real ratio ≈ 4.3) and under-count for very token-dense content (long URLs, code, structured-output). The 80 % threshold leaves enough headroom that mis-estimates of ±20 % do not bust the upstream.

Recursion-protection

Every /a1 call propagates SPASS-Caller-Depth (default 0). The handler increments before issuing the rig self-call. Depth ≥ 3 hard-fails with recursion_depth_exceeded — defensive guard against any future flow that could form an /a1 → /v1 → /a1 chain.

Audit-trail correlation

Every /v1 sub-call carries SPASS-Parent-Request-Id = outer /a1 request_id in audit.jsonl. Reconstruct a call-tree:

jq -c "select(.fields.parent_request_id == \"<outer-rid>\")" \
  /home/dietmar/dgx-llm/data/audit/audit.jsonl.YYYY-MM-DD

Cost aggregation

The /a1-outer response carries summed SPASS-Cost-{Eur,Usd,Source,...} headers across all sub-calls (Cut 2.7). A debug header SPASS-Cost-Sub-Calls reports the count.

HTTP/1.1 200 OK
spass-cost-eur: 0.02
spass-cost-usd: 0.02
spass-cost-source: upstream
spass-cost-sub-calls: 1

See /docs/response-headers for the full Cost-V2 spec.

/a1 — Agent System (Phase 2)