Streaming (SSE)
Both /v1/chat/completions and /c1/chat stream tokens via Server-Sent
Events when you set "stream": true.
Cut 2.23c (ADR 0016) —
SPASS-User-Id-Header is mandatory on user-scoped endpoints. Add-H "SPASS-User-Id: $USER_ID"to every curl-call below in addition to the bearer. Body- and query-user_idwould be rejected with HTTP 400invalid_field.Stream-mode tool-loop (Cut 2.21+) is now server-side —
/c1/chatand/v1/chat/completionsboth run tools internally and emit a synth-SSE stream including aevent: spass.costtrailer (Cut 2.21b) before[DONE].
The wire format is the OpenAI streaming shape:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
Each event is a single line prefixed with data: and terminated by a blank
line. The final event payload [DONE] is the sentinel for stream end.
curl streaming demo
curl -N -s https://dgx.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "Count slowly to ten."}],
"stream": true,
"max_tokens": 200
}'
-N disables curl's response buffering — without it you'd see the full
output in one chunk at the end.
Python SSE consumer
import json
import httpx
with httpx.stream(
"POST",
"https://dgx.spass.fun/v1/chat/completions",
headers={
"Authorization": f"Bearer {BEARER}",
"Content-Type": "application/json",
},
json={
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "Tell me a one-paragraph story."}],
"stream": True,
"max_tokens": 400,
},
timeout=120.0,
) as r:
r.raise_for_status()
for line in r.iter_lines():
if not line.startswith("data: "):
continue
data = line[len("data: "):]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"]
if (text := delta.get("content")):
print(text, end="", flush=True)
print()
Tool calls in streaming
When the model decides to call a tool, the delta switches to tool_calls
chunks. Concatenate tool_calls[*].function.arguments across chunks to
reconstruct the final argument JSON:
buf = {"name": None, "args": ""}
for line in lines: # as above
chunk = json.loads(line[len("data: "):])
delta = chunk["choices"][0]["delta"]
if (tcs := delta.get("tool_calls")):
for tc in tcs:
if (fn := tc.get("function", {}).get("name")):
buf["name"] = fn
if (a := tc.get("function", {}).get("arguments")):
buf["args"] += a
# Once the stream ends:
import json as _j
call_args = _j.loads(buf["args"])
Named SSE-Events (Cut 2.32, CR-0001)
Neben den OpenAI-spec data: …-Lines emittiert der Server seit Cut 2.32 zwei named SSE-Events vor dem finalen content + [DONE], wenn der Tool-Loop einen Synthesis-Pfad nehmen musste:
event: spass.cost
data: {"prompt_tokens":1234,"completion_tokens":567,"total_tokens":1801}
event: spass.tool-cap ← MAX_TOOL_ITERATIONS=10 erreicht
data: {"code":"tool_loop_max_iterations","iterations":10,"synth_called":true}
event: spass.tool-stripped ← Cut 2.36b — Llama-Halluzination im content gestripped (CR-0007 stream-fix)
data: {"code":"hallucinated_tool_stripped_all_unknown","tool_names":["translate_text"]}
event: spass.tool-feedback-recovery ← Cut 2.39 — Multi-Turn-Feedback hat clean response wiederhergestellt
data: {"code":"tool_feedback_recovery_after_all_unknown","retries_used":1,"stripped_names":["translate_text"]}
data: {"choices":[{"delta":{"content":"... narrative answer ..."}}]}
data: [DONE]
Bei Anti-Loop-Detection wird event: spass.tool-anti-loop statt spass.tool-cap emittiert. Payload-Shape ist identisch. Caller die named-events nicht parsen, sehen weiter nur den data:-Stream — die named-events sind reine Diagnose-Layer und brechen kein bestehendes SSE-Consumer-Pattern. Siehe errors.md für die zugehörigen dgx_code-Werte und examples-tools.md für die Tool-Loop-Semantik.
Streaming on /c1/chat
The same SSE shape is emitted by /c1. The conversation id is not in
the SSE body — it's in the response headers (x-conversation-id).
Capture it before consuming the stream:
with httpx.stream("POST", ".../c1/chat", json=payload, headers=hdrs) as r:
conv_id = r.headers.get("x-conversation-id")
for line in r.iter_lines():
...
What does not stream
- Image-generation aliases (
nano-banana,gpt-image,image-gen). The image is delivered atomically. Set"stream": false(or omit; it's the default for these aliases). /v1/info,/v1/models,/c1/conversations*— discovery endpoints return JSON in one shot.
Backend signaling
x-rust-api-applied header is only set on the final HTTP response —
since SSE is a single response with chunked body, you'll see the header
once at the start of the stream.
If something fails mid-stream (e.g. upstream connection drop), the error arrives outside the OpenAI envelope as a normal HTTP 502 with the standard error body — clients should handle both an SSE event stream and a non-SSE error JSON.