Quickstart
The gateway exposes two parallel APIs at the same host:
https://dgx-spark-4236.spass.fun/v1/chat/completions— OpenAI-compatible passthroughhttps://dgx-spark-4236.spass.fun/c1/chat— domain endpoint with conversation persistence
Authentication is the same for both: a single Bearer token configured server-side
as RUST_API_BEARER. See Authentication for details.
30-second test
export BEARER="<your RUST_API_BEARER>"
curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "In one word: capital of Italy?"}],
"max_tokens": 30
}'
Expected response (shape):
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1777411833,
"model": "llama-4-scout",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "Rome." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 12, "completion_tokens": 2, "total_tokens": 14 }
}
Using /c1/chat for memory
/c1 keeps a conversation alive in SQLite. Pass conversation_id on every
follow-up turn and the server prepends history automatically — clients don't
need to track it.
Turn 1 (no conversation_id → server creates one and returns it via the
x-conversation-id response header):
curl -i -s https://dgx-spark-4236.spass.fun/c1/chat \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"message": "My favourite colour is teal. Remember that.",
"tools": [],
"tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false,
"max_tokens": 60
}' | grep -E "x-conversation-id|content"
Turn 2 (reuse the conversation id):
CONV="<the id from turn 1>"
curl -s https://dgx-spark-4236.spass.fun/c1/chat \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"llama-4-scout\",
\"conversation_id\": \"$CONV\",
\"message\": \"What is my favourite colour?\",
\"tools\": [],
\"tool_choice\": \"auto\",
\"response_format\": {\"type\": \"text\"},
\"stream\": false,
\"max_tokens\": 40
}" | jq '.choices[0].message.content'
The model will recall teal because LMCache reuses the prefix and the server reloads the prior turn from SQLite.
What can fail
| Symptom | Cause | Fix |
|---|---|---|
| HTTP 401 | Missing or wrong Bearer | Set RUST_API_BEARER matching the server config |
HTTP 400 with code: model_not_in_allowlist | Model alias not whitelisted | List allowed models with GET /v1/info |
HTTP 400 with code: image_url_not_supported | You sent an image_url with an https:// URL | Inline as base64 data URI — see Vision examples |
Empty content field | Reasoning model with too-low max_tokens | Use max_tokens >= 200 for gpt-5.5-pro, gemini-3.1-pro, flagship |
| Slow / timed-out | gpt-image takes 100-180 s | Set client timeout ≥ 240 s for image-gen aliases |
The full Error catalog lists every stable code with remediation.
Next steps
- Learn the auth model and rate limits → Authentication
- See per-model behaviour quirks → Models & constraints
- Try the interactive Playground at
/playground— modal-cards with live request preview, streaming, drag-and-drop image upload.