Conversation persistence (/c1)
/c1/chat differs from /v1/chat/completions in three ways:
- Server keeps history. Pass
conversation_idand the server prepends prior turns automatically. - Different request schema. Single
messagefield (string or content- parts array) instead of fullmessagesarray.tools,tool_choice,response_formatare required (use[],"auto",{"type": "text"}for the no-op defaults). providerselector. Switch backend without changingmodelslug:"provider": "ollama"resolves to<model>-ollama.
Two-turn memory
# Turn 1 — let the server allocate a fresh conversation:
RESP1=$(curl -is -H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"message": "My favourite colour is teal. Remember that.",
"tools": [],
"tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false,
"max_tokens": 60
}' \
https://dgx-spark-4236.spass.fun/c1/chat)
# Pull the conversation id from the response header
CONV=$(echo "$RESP1" | tr -d '\r' | grep -i "^x-conversation-id:" | awk '{print $2}')
echo "conversation: $CONV"
# Turn 2 — recall on the same conversation
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d "{
\"model\": \"llama-4-scout\",
\"conversation_id\": \"$CONV\",
\"message\": \"What is my favourite colour?\",
\"tools\": [],
\"tool_choice\": \"auto\",
\"response_format\": {\"type\": \"text\"},
\"stream\": false,
\"max_tokens\": 40
}" \
https://dgx-spark-4236.spass.fun/c1/chat \
| jq -r '.choices[0].message.content'
The model answers teal — the server reloaded turn 1 from SQLite and prepended it to the prompt automatically. LMCache catches the prefix on the upstream side, so the second turn returns in tens of milliseconds with a cache hit.
Listing and deleting conversations
# Paginated list (newest first)
curl -s -H "Authorization: Bearer $BEARER" \
"https://dgx-spark-4236.spass.fun/c1/conversations?limit=10&offset=0" \
| jq '.items[] | {id, message_count, model, last_used_at}'
# Pull one conversation in full
curl -s -H "Authorization: Bearer $BEARER" \
"https://dgx-spark-4236.spass.fun/c1/conversations/$CONV" \
| jq
# Delete (soft — both user and assistant turns)
curl -X DELETE -H "Authorization: Bearer $BEARER" \
"https://dgx-spark-4236.spass.fun/c1/conversations/$CONV"
Provider switch via provider enum
If you want to test the same model across backends:
# Force Ollama Cloud (skip local vLLM, skip OpenRouter)
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"provider": "ollama",
"message": "Was ist 2+2?",
"tools": [], "tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false, "max_tokens": 40
}' \
https://dgx-spark-4236.spass.fun/c1/chat
provider resolves to <model>-<provider> (e.g. llama-4-scout-ollama).
If both model and provider are set, model wins — explicit slug
beats the convenience shortcut.
system_prompt (only on first turn)
Set once when creating a fresh conversation; ignored on follow-ups:
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"system_prompt": "You answer concisely in German.",
"message": "Explain GPUs in one sentence.",
"tools": [], "tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false, "max_tokens": 80
}' \
https://dgx-spark-4236.spass.fun/c1/chat
Subsequent turns on the same conversation_id keep the original system prompt.
ephemeral mode
{"ephemeral": true} runs a turn through /c1 for the convenience but
doesn't persist anything — no user message, no assistant reply land in
SQLite. Useful for one-off tools that benefit from provider selector but
don't want to pollute history.
User isolation
Optional user_id field scopes conversations to that user. Different
user_ids sharing the same bearer token cannot read or mutate each other's
conversations. Requesting somebody else's conversation_id returns 404 with
code: conversation_not_found (rather than 403, to avoid leaking existence).
# Alice's conversation
curl ... -d '{"user_id": "alice", "message": "...", ...}' .../c1/chat
# Bob trying to access alice's conversation_id → 404
curl ... -d '{"user_id": "bob", "conversation_id": "<alice-id>", "message": "...", ...}' .../c1/chat