Streaming (SSE)
Both /v1/chat/completions and /c1/chat stream tokens via Server-Sent
Events when you set "stream": true.
The wire format is the OpenAI streaming shape:
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
Each event is a single line prefixed with data: and terminated by a blank
line. The final event payload [DONE] is the sentinel for stream end.
curl streaming demo
curl -N -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "Count slowly to ten."}],
"stream": true,
"max_tokens": 200
}'
-N disables curl's response buffering — without it you'd see the full
output in one chunk at the end.
Python SSE consumer
import json
import httpx
with httpx.stream(
"POST",
"https://dgx-spark-4236.spass.fun/v1/chat/completions",
headers={
"Authorization": f"Bearer {BEARER}",
"Content-Type": "application/json",
},
json={
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "Tell me a one-paragraph story."}],
"stream": True,
"max_tokens": 400,
},
timeout=120.0,
) as r:
r.raise_for_status()
for line in r.iter_lines():
if not line.startswith("data: "):
continue
data = line[len("data: "):]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0]["delta"]
if (text := delta.get("content")):
print(text, end="", flush=True)
print()
Tool calls in streaming
When the model decides to call a tool, the delta switches to tool_calls
chunks. Concatenate tool_calls[*].function.arguments across chunks to
reconstruct the final argument JSON:
buf = {"name": None, "args": ""}
for line in lines: # as above
chunk = json.loads(line[len("data: "):])
delta = chunk["choices"][0]["delta"]
if (tcs := delta.get("tool_calls")):
for tc in tcs:
if (fn := tc.get("function", {}).get("name")):
buf["name"] = fn
if (a := tc.get("function", {}).get("arguments")):
buf["args"] += a
# Once the stream ends:
import json as _j
call_args = _j.loads(buf["args"])
Streaming on /c1/chat
The same SSE shape is emitted by /c1. The conversation id is not in
the SSE body — it's in the response headers (x-conversation-id).
Capture it before consuming the stream:
with httpx.stream("POST", ".../c1/chat", json=payload, headers=hdrs) as r:
conv_id = r.headers.get("x-conversation-id")
for line in r.iter_lines():
...
What does not stream
- Image-generation aliases (
nano-banana,gpt-image,image-gen). The image is delivered atomically. Set"stream": false(or omit; it's the default for these aliases). /v1/info,/v1/models,/c1/conversations*— discovery endpoints return JSON in one shot.
Backend signaling
x-rust-api-applied header is only set on the final HTTP response —
since SSE is a single response with chunked body, you'll see the header
once at the start of the stream.
If something fails mid-stream (e.g. upstream connection drop), the error arrives outside the OpenAI envelope as a normal HTTP 502 with the standard error body — clients should handle both an SSE event stream and a non-SSE error JSON.