DGX LLM Chat Gateway

Quickstart

The gateway exposes two parallel APIs at the same host:

Authentication is the same for both: a single Bearer token configured server-side as RUST_API_BEARER. See Authentication for details.

30-second test

export BEARER="<your RUST_API_BEARER>"

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "messages": [{"role": "user", "content": "In one word: capital of Italy?"}],
    "max_tokens": 30
  }'

Expected response (shape):

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1777411833,
  "model": "llama-4-scout",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Rome." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 12, "completion_tokens": 2, "total_tokens": 14 }
}

Using /c1/chat for memory

/c1 keeps a conversation alive in SQLite. Pass conversation_id on every follow-up turn and the server prepends history automatically — clients don't need to track it.

Turn 1 (no conversation_id → server creates one and returns it via the x-conversation-id response header):

curl -i -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "message": "My favourite colour is teal. Remember that.",
    "tools": [],
    "tool_choice": "auto",
    "response_format": {"type": "text"},
    "stream": false,
    "max_tokens": 60
  }' | grep -E "x-conversation-id|content"

Turn 2 (reuse the conversation id):

CONV="<the id from turn 1>"

curl -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama-4-scout\",
    \"conversation_id\": \"$CONV\",
    \"message\": \"What is my favourite colour?\",
    \"tools\": [],
    \"tool_choice\": \"auto\",
    \"response_format\": {\"type\": \"text\"},
    \"stream\": false,
    \"max_tokens\": 40
  }" | jq '.choices[0].message.content'

The model will recall teal because LMCache reuses the prefix and the server reloads the prior turn from SQLite.

What can fail

SymptomCauseFix
HTTP 401Missing or wrong BearerSet RUST_API_BEARER matching the server config
HTTP 400 with code: model_not_in_allowlistModel alias not whitelistedList allowed models with GET /v1/info
HTTP 400 with code: image_url_not_supportedYou sent an image_url with an https:// URLInline as base64 data URI — see Vision examples
Empty content fieldReasoning model with too-low max_tokensUse max_tokens >= 200 for gpt-5.5-pro, gemini-3.1-pro, flagship
Slow / timed-outgpt-image takes 100-180 sSet client timeout ≥ 240 s for image-gen aliases

The full Error catalog lists every stable code with remediation.

Next steps