/a1 — Agent System (Phase 2)
The /a1/agents/... surface is a rig-core powered agent system on top of /v1/chat/completions. Agents are configured via YAML files, can use stack-side tools, retrieve from per-tenant RAG indices, and persist multi-turn sessions.
Architecture (ADR 0011)
Caller → POST /a1/agents/<name>/chat
│ auth-middleware re-resolves Identity
│ build rig::Agent from agents/<name>.yaml
│ rig sub-call …
└─▶ POST http://localhost:3000/v1/chat/completions (self-call)
├─ SPASS-Augment: server-tools=off, system-prompt=off, memory=off
├─ SPASS-Caller-Depth: <incremented>
├─ SPASS-Parent-Request-Id: <outer /a1 request-id>
└─ Bearer: <forwarded caller-bearer>
→ full /v1 pipeline (Cost-V2, audit, Layer-1, leak-checks)
Path Y = "rig speaks to our own /v1" — every Stack-invariant (Cost-Pipeline V2, Audit, Layer-1 defense, ADR-0009 leak-sanitization, SPASS-Augment) applies transparently to every rig sub-call.
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /a1/agents | List all configured agents (operator-authored YAML) |
| GET | /a1/agents/<name> | Full agent config (system_prompt, model, tools, rag_index) |
| POST | /a1/agents/<name>/chat | Single-shot prompt → completion |
| POST | /a1/agents/<name>/sessions | Create persistent multi-turn session |
| GET | /a1/agents/<name>/sessions | List the caller's sessions for that agent |
| GET | /a1/agents/<name>/sessions/<sid> | Session metadata + msg-count |
| DELETE | /a1/agents/<name>/sessions/<sid> | Drop session + all messages |
| GET | /a1/agents/<name>/sessions/<sid>/messages | Full conversation history |
| POST | /a1/agents/<name>/sessions/<sid>/messages | Append turn (uses persisted history); 308 → successor if archived |
| POST | /a1/agents/<name>/sessions/<sid>/compact | Compact older turns into a summary, archive source, create successor (Cut 2.12) |
| GET | /a1/agents/<name>/sessions/<sid>/lineage | Forward + backward lineage chain + summary records (Cut 2.12) |
Agent YAML config
Operator-authored, lives in data/agents/*.yaml on the host. Loaded once at process boot — restart needed for config changes (hot-reload is on the backlog).
name: berlin-rag # URL-safe [a-zA-Z0-9_-]+
model: claude-opus-4.7 # must be in ALLOWED_MODELS + per-token allowlist
description: Berlin RAG persona # shown in /a1/agents listing
system_prompt: |
Du beantwortest Fragen zu Berlin auf Deutsch...
max_tokens: 512 # optional
temperature: 0.3 # optional
tools: # optional — stack tools available to agent
- calculator
- current_datetime
- rag_query # active-RAG tool, see /docs/rag
rag_index: berlin-info # optional — passive RAG context-injection
rag_top_k: 3 # optional, default 5
Bundled agents (out-of-the-box examples)
| Agent | Model | Tools | RAG | Use-case |
|---|---|---|---|---|
concise-de | llama-4-scout (local) | — | — | Bare LLM, terse German |
berlin-tour | llama-4-scout (local) | — | — | Berlin-tour persona |
berlin-rag | llama-4-scout (local) | — | berlin-info | Passive RAG (rig dynamic_context) |
researcher | claude-opus-4.7 | calculator, current_datetime, wikipedia_search, wikipedia_summary | — | Tool-using research assistant |
researcher-rag | claude-opus-4.7 | calculator, current_datetime, wikipedia_search | matrix-rag-idx | Tools + RAG combined |
librarian | claude-opus-4.7 | rag_query | — | Active-RAG via tool-call |
Tool-calling note: local Llama-4-Scout-FP4 (
--tool-call-parser=pythonic) currently emits tool-calls as plain-text JSON, which our /v1 tool-loop cannot parse. Agents that need tools should pinmodel: claude-opus-4.7for now. See backlog forllama4_jsonparser switch.
Multi-turn sessions
Sessions persist in a tenant-scoped SQLite at data/sqlite/agent_sessions/<tenant>.sqlite. Each session is agent-pinned at creation — switching the agent mid-session returns 400 session_agent_mismatch.
Sessions are user-scoped (Cut 2.23c, ADR 0016). All session-endpoints require SPASS-User-Id header — different end-users sharing the same bearer cannot see or mutate each other's sessions. Pre-Cut-2.23c sessions (DEFAULT user_id '') are effectively invisible to all end-users; operator can read them via SQL.
Per-session compaction config (Cut 2.12 + 2.13) — optional body fields on create. Missing fields fall through to the per-tenant cascade default (process → yaml → DB), so an operator can change the tenant-wide default without touching every session-create call.
| Field | Type | Default source | Notes |
|---|---|---|---|
compact_strategy | auto|manual|off | tenant cascade (auto) | auto enables the auto-trigger gate (see below); manual keeps only the explicit POST /compact; off disables compaction entirely |
compact_keep_last_n | 0..200 | tenant cascade (10) | live messages preserved verbatim when compaction runs; everything older becomes the summary |
compact_observation_mask | bool | tenant cascade (true) | true instructs the summary-model to drop tool-noise / acknowledgements; false passes raw turns |
TOKEN="$(grep '^RUST_API_BEARER=' /home/dietmar/dgx-llm/.env | cut -d= -f2)"
SID=$(curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions \
-H "Authorization: Bearer $TOKEN" | jq -r .id)
curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions/$SID/messages \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"message": "Mein Lieblingssport ist Tennis."}' | jq
curl -s -X POST http://localhost:3000/a1/agents/concise-de/sessions/$SID/messages \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"message": "Welcher Sport ist mein Liebling?"}' | jq
# → "Dein Lieblingssport ist Tennis."
Compaction (Cut 2.12) — Z-chained lineage + C4-hybrid
When a session grows past the model's effective context-budget, or the operator triggers POST /a1/agents/<n>/sessions/<sid>/compact manually:
- The most-recent
compact_keep_last_n(default 10) messages are preserved verbatim. - Everything older is summarised by the per-tenant effective
compact_summary_model— pulled from the 3-level cascadeprocess → yaml → DB(defaultllama-4-scoutfor Datenschutz; tenant can override viatokens.yaml::tenants[].defaults.compact_summary_modelor runtimePUT /v1/tenant/config). See Per-tenant config and ADR 0013. - The summarised rows are soft-deleted (
compacted_at = NOW); asession_summariesrow is persisted. - A new successor session is created with
parent_session_id→ source. - The summary is bootstrapped into the successor as a single synthetic
assistant-role message ("[compaction summary from session X] …"); the kept-last-N messages are copied verbatim. - The source session is
archived_at = NOW,successor_session_id= new. - Subsequent chat-attempts on the archived session id return HTTP 308 with
Location: /a1/agents/<n>/sessions/<successor>/messages— standard HTTP-clients follow this transparently and the chat lands on the successor with the body intact.
# Compact a session manually
curl -s -X POST "http://localhost:3000/a1/agents/$NAME/sessions/$SID/compact" \
-H "Authorization: Bearer $TOKEN" | jq
# → { source_session_id, successor_session_id, summary_id, summary_text, ... }
# Inspect the lineage chain
curl -s "http://localhost:3000/a1/agents/$NAME/sessions/$SID/lineage" \
-H "Authorization: Bearer $TOKEN" | jq
# → { backward: [...], forward: [...], summaries: [...] }
observation_mask controls the summary-model's instruction template. With true (default), the model is told to drop greetings, acknowledgements, raw tool-output, and error-traces while preserving named entities, numbers, decisions, and unresolved questions. With false, raw turns are passed verbatim — useful when tool-output is itself the load-bearing context (e.g. RAG-retrieval pipelines).
Failure modes:
409 session_compact_conflict— fewer thankeep_last_nlive messages, or session already archived. Lineage view (GET .../lineage) shows the live successor.- The summary-model self-call goes through our own
/v1/chat/completionsso audit + rate-limit + cost-pipeline all apply uniformly. Cost is attributed to the caller's tenant via the bearer-forward pattern.
Auto-trigger
When compact_strategy: auto (the default), POST .../sessions/<sid>/messages checks BEFORE the chat-loop runs:
| Condition | Source |
|---|---|
compact_strategy == "auto" | session row |
live-message count > compact_keep_last_n | messages table where compacted_at IS NULL |
estimated tokens > 80 % of model context_window | char-based heuristic (~4 chars/token) over live-messages × catalog-lookup of model |
If all three hold, the handler compacts the session FIRST, then runs the chat against the brand-new successor. The response carries:
SPASS-Session-Compacted: <new-sid>— successor id; persist this for the next turn (the originalsidwill return 308 from now on).SPASS-Session-Compacted-Tokens: <n>— pre-compact token-estimate that triggered the compaction; useful for tuningcompact_keep_last_nif compactions are firing too often or too rarely.
Auto-trigger failures are logged but do NOT bubble up — the chat continues against the original (now-uncompacted) session and the upstream may surface its own context-overflow error if it really busts. This is intentional: a transient summary-model issue should not break user-facing chat. Switch to compact_strategy: manual to disable the auto-trigger entirely while keeping the explicit POST /compact endpoint available.
Token-estimate caveat: the gate uses a conservative chars / 4 heuristic, not real tokens. It tends to over-count for English-heavy prose (real ratio ≈ 4.3) and under-count for very token-dense content (long URLs, code, structured-output). The 80 % threshold leaves enough headroom that mis-estimates of ±20 % do not bust the upstream.
Recursion-protection
Every /a1 call propagates SPASS-Caller-Depth (default 0). The handler increments before issuing the rig self-call. Depth ≥ 3 hard-fails with recursion_depth_exceeded — defensive guard against any future flow that could form an /a1 → /v1 → /a1 chain.
Audit-trail correlation
Every /v1 sub-call carries SPASS-Parent-Request-Id = outer /a1 request_id in audit.jsonl. Reconstruct a call-tree:
jq -c "select(.fields.parent_request_id == \"<outer-rid>\")" \
/home/dietmar/dgx-llm/data/audit/audit.jsonl.YYYY-MM-DD
Cost aggregation
The /a1-outer response carries summed SPASS-Cost-{Eur,Usd,Source,...} headers across all sub-calls (Cut 2.7). A debug header SPASS-Cost-Sub-Calls reports the count.
HTTP/1.1 200 OK
spass-cost-eur: 0.02
spass-cost-usd: 0.02
spass-cost-source: upstream
spass-cost-sub-calls: 1
See /docs/response-headers for the full Cost-V2 spec.