Quickstart

The gateway exposes two parallel APIs at the same host:

https://dgx-spark-4236.spass.fun/v1/chat/completions — OpenAI-compatible passthrough
https://dgx-spark-4236.spass.fun/c1/chat — domain endpoint with conversation persistence

Authentication is the same for both: a single Bearer token configured server-side as RUST_API_BEARER. See Authentication for details.

30-second test

export BEARER="<your RUST_API_BEARER>"

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "messages": [{"role": "user", "content": "In one word: capital of Italy?"}],
    "max_tokens": 30
  }'

Expected response (shape):

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1777411833,
  "model": "llama-4-scout",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Rome." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 12, "completion_tokens": 2, "total_tokens": 14 }
}

Using `/c1/chat` for memory

/c1 keeps a conversation alive in SQLite. Pass conversation_id on every follow-up turn and the server prepends history automatically — clients don't need to track it.

Turn 1 (no conversation_id → server creates one and returns it via the x-conversation-id response header):

curl -i -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "message": "My favourite colour is teal. Remember that.",
    "tools": [],
    "tool_choice": "auto",
    "response_format": {"type": "text"},
    "stream": false,
    "max_tokens": 60
  }' | grep -E "x-conversation-id|content"

Turn 2 (reuse the conversation id):

CONV="<the id from turn 1>"

curl -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama-4-scout\",
    \"conversation_id\": \"$CONV\",
    \"message\": \"What is my favourite colour?\",
    \"tools\": [],
    \"tool_choice\": \"auto\",
    \"response_format\": {\"type\": \"text\"},
    \"stream\": false,
    \"max_tokens\": 40
  }" | jq '.choices[0].message.content'

The model will recall teal because LMCache reuses the prefix and the server reloads the prior turn from SQLite.

What can fail

Symptom	Cause	Fix
HTTP 401	Missing or wrong Bearer	Set `RUST_API_BEARER` matching the server config
HTTP 400 with `code: model_not_in_allowlist`	Model alias not whitelisted	List allowed models with `GET /v1/info`
HTTP 400 with `code: image_url_not_supported`	You sent an `image_url` with an `https://` URL	Inline as base64 data URI — see Vision examples
Empty `content` field	Reasoning model with too-low `max_tokens`	Use `max_tokens >= 200` for `gpt-5.5-pro`, `gemini-3.1-pro`, `flagship`
Slow / timed-out	`gpt-image` takes 100-180 s	Set client timeout ≥ 240 s for image-gen aliases

The full Error catalog lists every stable code with remediation.

Next steps

Learn the auth model and rate limits → Authentication
See per-model behaviour quirks → Models & constraints
Try the interactive Playground at /playground — modal-cards with live request preview, streaming, drag-and-drop image upload.

Quickstart

30-second test

Using /c1/chat for memory

What can fail

Next steps

Using `/c1/chat` for memory