Conversation persistence (/c1)
/c1/chat differs from /v1/chat/completions in three ways:
- Server keeps history. Pass
conversation_idand the server prepends prior turns automatically. - Different request schema. Single
messagefield (string or content- parts array) instead of fullmessagesarray.tools,tool_choice,response_formatare required (use[],"auto",{"type": "text"}for the no-op defaults). providerselector. Switch backend without changingmodelslug:"provider": "ollama"resolves to<model>-ollama.
Two-turn memory
# Turn 1 — let the server allocate a fresh conversation:
RESP1=$(curl -is -H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"message": "My favourite colour is teal. Remember that.",
"tools": [],
"tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false,
"max_tokens": 60
}' \
https://dgx.spass.fun/c1/chat)
# Pull the conversation id from the response header
CONV=$(echo "$RESP1" | tr -d '\r' | grep -i "^x-conversation-id:" | awk '{print $2}')
echo "conversation: $CONV"
# Turn 2 — recall on the same conversation
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d "{
\"model\": \"llama-4-scout\",
\"conversation_id\": \"$CONV\",
\"message\": \"What is my favourite colour?\",
\"tools\": [],
\"tool_choice\": \"auto\",
\"response_format\": {\"type\": \"text\"},
\"stream\": false,
\"max_tokens\": 40
}" \
https://dgx.spass.fun/c1/chat \
| jq -r '.choices[0].message.content'
The model answers teal — the server reloaded turn 1 from SQLite and prepended it to the prompt automatically. LMCache catches the prefix on the upstream side, so the second turn returns in tens of milliseconds with a cache hit.
Listing and deleting conversations
# Paginated list (newest first)
curl -s -H "Authorization: Bearer $BEARER" \
"https://dgx.spass.fun/c1/conversations?limit=10&offset=0" \
| jq '.items[] | {id, message_count, model, last_used_at}'
# Pull one conversation in full
curl -s -H "Authorization: Bearer $BEARER" \
"https://dgx.spass.fun/c1/conversations/$CONV" \
| jq
# Delete (soft — both user and assistant turns)
curl -X DELETE -H "Authorization: Bearer $BEARER" \
"https://dgx.spass.fun/c1/conversations/$CONV"
Provider switch via provider enum
If you want to test the same model across backends:
# Force Ollama Cloud (skip local vLLM, skip OpenRouter)
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"provider": "ollama",
"message": "Was ist 2+2?",
"tools": [], "tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false, "max_tokens": 40
}' \
https://dgx.spass.fun/c1/chat
provider resolves to <model>-<provider> (e.g. llama-4-scout-ollama).
If both model and provider are set, model wins — explicit slug
beats the convenience shortcut.
System-Prompt Hybrid-Schema (Cut 2.33, CR-0003)
Empfehlung für GoCreate-style "globaler tenant-prompt"-Use-Case: NICHT mit
system_prompt_ref-Feld pro Request arbeiten, sondern den Prompt einmal viaPOST /v1/system-prompts/tenantsetzen. Dann läuft die Auto-Inject Pipeline für alle Folge-Calls automatisch ohne Frontend-Aktion. Die 3 body-Felder hier sind primär für per-conversation-overrides wenn der globale Prompt nicht passt.
Cut 2.33 erweitert POST /c1/chat um drei mutual-exclusive System-Prompt-Felder (genau eines erlaubt, sonst HTTP 400 invalid_field):
| Feld | Verhalten |
|---|---|
system_prompt | Inline-Text. Wird beim ersten Turn als role=system-row persistiert. |
system_prompt_ref | Referenz auf einen named per-tenant Prompt (POST /v1/system-prompts/<name> auf scope-level). Server resolved + persistiert inline. |
additional_system_prompt | Append-style (OpenAI Assistants-API per-Run-Instructions-pattern). Persistiert mit Prefix "Zusätzliche Hinweise:\n…". Tenant-default-Prompts werden weiterhin per-request injected. |
curl -s -H "Authorization: Bearer $BEARER" -H "SPASS-User-Id: $USER_ID" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"system_prompt": "You answer concisely in German.",
"message": "Explain GPUs in one sentence."
}' \
https://dgx.spass.fun/c1/chat
Multi-Turn-Verhalten: wenn auf einem späteren Turn wieder ein system-Feld gesetzt wird obwohl die Conversation schon eine role=system-row hat, wird das ignoriert und der Server emittiert den Response-Header spass-system-prompt-ignored: already_present als explicit signal (statt silent-ignore).
Storage: role=system-rows tauchen in GET /c1/conversations/{id}.messages[] auf. Frontend kann via m.role !== "system"-Filter eine reine User-Sicht zeigen oder die system-row bewusst als Settings-Icon rendern.
Nachträgliche Injection: POST /c1/conversations/{id}/system-prompt mit demselben body-shape (genau eines der drei Felder) appendet eine zusätzliche role=system-row (NICHT replace — audit-trail-fähig).
Native Chat-Summary (Cut 2.33, CR-0002; erweitert Cut 2.46, CR-0013)
Cut 2.33 generiert LLM-basierte Topic-Title + 1-2-Satz-Summary nach dem ersten Assistant-Turn (fire-and-forget, ~500ms-1s lag). Cut 2.46 (CR-0013) erweitert das um drei Dinge:
-
Locale-aware: Titel + Summary werden in der Sprache des
Accept-Language-Headers erzeugt (erster Tag, z.B.en,fr,cs); ohne Header gilt der Stack-Default (envDEFAULT_LOCALE, Standardde). Vorher hart Deutsch. -
Tenant-Modell: Das Summary-Modell kommt jetzt aus dem Tenant-Setting
compact_summary_model(Cascade DB→yaml→Defaultllama-4-scout) statt hartllama-4-scout-local. EnvSUMMARY_MODELbleibt als globaler Override (gewinnt). Siehe Tenant-Config. -
Re-Summary bei Wachstum: Der Auto-Trigger feuert erneut, wenn die Conversation seit der letzten Generierung um ≥ 10 Messages gewachsen ist — der Titel langer Chats bleibt aktuell (statt auf dem ersten Turn einzufrieren).
-
GET /c1/conversations/{id}returnt additivemeta: { title, summary, model, updated_at }-Feld.nullsolange auto-trigger nicht durch ist. -
GET /c1/conversationsenricheditems[].titlemit dem 3-5-Wort-LLM-Title (überschreibt Cut-2.25-80-char-prefix) unditems[].summary. -
POST /c1/conversations/{id}/summaryreturnt cached meta (default) oder mit?refresh=trueeinen synchronen frischen LLM-Call.Accept-Languagesteuert auch hier die Sprache.
# Auto-Trigger nach erstem Turn — meta materialisiert sich ~700ms später.
# Accept-Language steuert die Sprache (hier: de).
curl -s "https://dgx.spass.fun/c1/conversations/$CONV" \
-H "Authorization: Bearer $BEARER" -H "SPASS-User-Id: $USER_ID" \
-H "Accept-Language: de" \
| jq '.meta'
# → { "title": "Hauptstadt von Deutschland",
# "summary": "Die Hauptstadt von Deutschland ist Berlin.",
# "model": "llama-4-scout",
# "updated_at": "2026-06-21T..Z" }
# Force-regenerate (z.B. nach längerem Chat) — englischer Titel/Summary via Header.
curl -s -X POST "https://dgx.spass.fun/c1/conversations/$CONV/summary?refresh=true" \
-H "Authorization: Bearer $BEARER" -H "SPASS-User-Id: $USER_ID" \
-H "Accept-Language: en"
# → { "title": "...", "summary": "...", "cached": false, ... }
ephemeral mode
{"ephemeral": true} runs a turn through /c1 for the convenience but
doesn't persist anything — no user message, no assistant reply land in
SQLite. Useful for one-off tools that benefit from provider selector but
don't want to pollute history.
User isolation (Header-only, ADR 0016 / Cut 2.23c)
SPASS-User-Id header scopes conversations to that user. Different end-users
sharing the same bearer token cannot read or mutate each other's
conversations. Requesting somebody else's conversation_id returns 404 with
code: conversation_not_found (rather than 403, to avoid leaking existence).
Body and query user_id are forbidden (HTTP 400 invalid_field with
param: body.user_id or query.user_id). Use the header.
# Alice's conversation
curl -H "SPASS-User-Id: alice" -d '{"message": "...", ...}' .../c1/chat
# Bob trying to access Alice's conversation_id → 404 (existence-leak protected)
curl -H "SPASS-User-Id: bob" -d '{"conversation_id": "{conversation_id}", "message": "...", ...}' .../c1/chat
# Old shape — DO NOT use, returns 400 invalid_field, param=body.user_id
curl -H "SPASS-User-Id: alice" -d '{"user_id": "alice", "message": "..."}' .../c1/chat