DGX LLM Chat Gateway

Error catalog

All error responses follow a stable envelope. The same shape is used by both /v1/* and /c1/* endpoints.

{
  "error": {
    "type":    "<category>",
    "code":    "<stable-code>",
    "message": "<human-readable>",
    "param":   "<json-pointer>"
  }
}

Machine-readable catalog

Pull the full catalog as JSON from GET /errors (no auth):

curl -s https://dgx.spass.fun/errors | jq '.entries[0]'
{
  "code": "missing_authorization",
  "type": "authentication_error",
  "http_status": 401,
  "title": "Authorization header is missing",
  "description": "All `/v1/*` and `/c1/*` endpoints require a Bearer token.",
  "remediation": "Add `Authorization: Bearer <RUST_API_BEARER>` to the request.",
  "typical_param": "headers.authorization"
}

Use this to generate stable error-handling code in your client without hand-typing constants.

Cross-link from JSON to docs

Each entry below has a stable HTML-anchor matching its code. The machine-readable JSON at /errors can be cross-linked into this page via /docs/errors#<code>. Example: a failed call returning {"code":"image_base64_invalid"} jumps to /docs/errors#image_base64_invalid. Anchors do not change once published.

Authentication errors (401)

CodeCauseFix
missing_authorizationNo Authorization headerAdd Authorization: Bearer <token>
invalid_authorizationToken mismatchVerify RUST_API_BEARER matches server

missing_authorization

Add Authorization: Bearer <RUST_API_BEARER> to the request.

invalid_authorization

The Bearer token did not match any token in the multi-tenant catalog. The comparison is constant-time. The same code is also returned when an X-Tenant-Id header is presented that the matched token is not bound to.

Rate-limit / Quota (429)

CodeCauseFix
rate_limit_exceededPer-token bucket emptyBack off and retry; tune RATE_LIMIT_* server-side
model_quota_exhaustedModel temporarily out of quota / credits / daily budgetPick a different model alias or wait for the budget window to reset
openrouter_daily_quota_exhaustedOpenRouter-specific daily limit hit on a tenant aliasUse a fallback alias (Claude/Gemini); response carries Retry-After until next 00:00 UTC

rate_limit_exceeded

Per-token rate limits (default 1 request/sec, 30 burst). Bucket is keyed on SHA-256 of the bearer, not IP.

model_quota_exhausted

Cut 2.23d (2026-05-04). The chosen model is temporarily unavailable for this tenant because some upstream budget — token quota, daily credit limit, or rate window — has been used up. The body is intentionally provider-agnostic per ADR 0006 v2: no upstream-name, no token counts, no billing URLs. Identical caller-facing shape to other rate-limit responses.

Fix options for the caller:

Operator action: raise the per-tenant budget in tokens.yaml if affordable, or top up with the upstream provider directly.

openrouter_daily_quota_exhausted

Cut 2.32 (2026-05-16, CR-0005). Tenant-Alias (z.B. godelmann-gocreate-premium-gpt-text-premium) hat sein OpenRouter-Tageslimit erschöpft. Anders als beim generischen model_quota_exhausted (Cut 2.23d) ist hier der Provider explizit identifiziert weil der Alias direkt openrouter-gepinnt ist (kein cross-vendor-Fallback im Stack-Pfad).

Response-Shape:

HTTP/2 429
Retry-After: 32400
content-type: application/json
{
  "error": {
    "type": "rate_limit_error",
    "code": "openrouter_daily_quota_exhausted",
    "message": "GPT-Tageslimit über OpenRouter erschöpft. Reset um 00:00 UTC. Verfügbare Fallback-Modelle: …"
  },
  "dgx_code": "openrouter_daily_quota_exhausted",
  "available_fallbacks": ["godelmann-gocreate-premium-claude-text-premium", "godelmann-gocreate-premium-gemini-text-premium"]
}

Caller-Pattern: auf dgx_code: "openrouter_daily_quota_exhausted" mappen, dem User "GPT-Tageslimit erreicht, probiere Claude oder Gemini" zeigen, Retry-After für ein Auto-Retry nach Mitternacht UTC nutzen.

Tool-Loop Status-Codes (HTTP 200)

Diese Codes erscheinen im body als dgx_code-Feld (NICHT als HTTP-error), wenn der Server-side Tool-Loop einen besonderen Synthesis-Pfad nehmen musste. HTTP-Status ist 200 — der Caller bekommt eine vollständige narrative Antwort, das dgx_code ist nur ein Diagnose-Marker.

CodeBedeutungfinish_reasoncontent
tool_loop_max_iterationsTool-Loop hat MAX_TOOL_ITERATIONS=10 erreichtstopnarrative Synthese aus den bisher collected tool-results
tool_loop_anti_loop_synthesisedIdentische Tool-Call-Signatur 2x hintereinander erkanntstopnarrative Synthese statt loop-fortsetzung

Im Stream-Mode (stream: true) emittiert der Server zusätzlich ein named SSE-Event event: spass.tool-cap bzw. event: spass.tool-anti-loop mit JSON-Payload {code, iterations, synth_called} vor dem finalen content + [DONE].

tool_loop_max_iterations

Cut 2.32 (2026-05-16, CR-0001). Multi-Tool-heavy-Anfrage hat die per-request Tool-Iteration-Cap von 10 erreicht ohne ein finales narrative-stop. Statt einer raw tool_calls-Response (die historisch zu hängenden Frontend-UIs führte weil keine content ankam) macht der Server einen extra Synthesis-LLM-Call mit tools: [] und einem Prompt der die bisher collected tool-results enthält. Ergebnis: garantiert eine vollständige narrative Antwort.

tool_loop_anti_loop_synthesised

Cut 2.32 (2026-05-16, CR-0001). Server hat erkannt dass das LLM denselben Tool-Call mit identischer Signatur (name + args) zweimal hintereinander ausgeführt hat — typisches Llama-Loop-Verhalten bei missverstandener Tool-Anforderung. Synthesis-LLM-Call mit tools: [] durchbricht den Loop und erzwingt eine Antwort aus den bereits vorhandenen Tool-Outputs.

Invalid request (4xx)

CodeStatusCause
body_too_large413Body exceeds MAX_BODY_BYTES (default 32 MB)
invalid_json400Body unparseable / schema mismatch
missing_field400Required field absent
invalid_field400Field value bad type / range / enum
model_not_in_allowlist400model slug not whitelisted
max_tokens_below_minimum400Even after auto-floor, value still rejected upstream
image_url_not_supported400image_url.url is http(s):// — must be base64 data URI
image_decode_error400Data URI malformed
image_base64_invalid400Inline base64 payload not parseable (S-series Item 2)
tool_call_arguments_invalid400tool_calls[].function.arguments is non-parseable JSON-string (Layer 1)
upstream_bad_request400Upstream rejected as 4xx — propagated verbatim instead of opaque 502

body_too_large

Bodies cap at MAX_BODY_BYTES (default 32 MB). Most often hit with very large base64 image payloads.

invalid_json

The body could not be parsed as JSON. Validate against /openapi.json.

missing_field

Required field absent. The param field of the error envelope shows which one.

invalid_field

A field's value did not match expected type/range/enum. See param for which field.

model_not_in_allowlist

The model slug isn't whitelisted by the stack. Use one of the slugs from /v1/info or /v1/models.

max_tokens_below_minimum

Some upstream cloud paths reject low values (max_output_tokens >= 16), reasoning models need >= 200 to leave room for hidden reasoning tokens before any visible content. The rust-api silently floors max_tokens to the model's documented minimum and reports the adjustment via response header:

spass-applied: max_tokens_floored=200

Only when even the floor would still be invalid does the error surface. Read the per-model constraints.min_max_tokens from /v1/info:

curl -s -H "Authorization: Bearer $BEARER" https://dgx.spass.fun/v1/info \
  | jq '.models[] | {alias, min_max_tokens: .constraints.min_max_tokens}'

image_url_not_supported

Cloud providers refuse to fetch arbitrary URLs server-side; local inference doesn't either. The rust-api validates against constraints.accepts_image_url before forwarding and rejects up-front so you get a clear param pointer instead of an opaque 400 from the upstream.

Encode your image as a base64 data URI:

B64=$(base64 -w 0 image.jpg)
curl -s -H "Authorization: Bearer $BEARER" \
     -H "Content-Type: application/json" \
     -d "{
       \"model\": \"llama-4-scout\",
       \"messages\": [{
         \"role\": \"user\",
         \"content\": [
           {\"type\": \"text\", \"text\": \"Describe this\"},
           {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
         ]
       }]
     }" \
     https://dgx.spass.fun/v1/chat/completions

image_decode_error

The base64-encoded data URI could not be parsed, the MIME type was missing, or the decoded bytes were not a valid image. Verify format data:image/<jpeg|png|webp|gif>;base64,<data>. Re-encode with base64 -w 0 (no line wrapping).

image_base64_invalid

Pre-flight check: rust-api decodes every data:image/...;base64,<payload> and rejects non-parseable base64 before forwarding upstream. Common JS bug: passing a UTF-8 string through btoa corrupts non-ASCII bytes — read as Uint8Array first. Output must be [A-Za-z0-9+/]+={0,2} only.

tool_call_arguments_invalid

OpenAI's tool-calling spec encodes function.arguments as a JSON-string (e.g. "arguments":"{\"key\":\"value\"}"). rust-api parses each argument-string with serde_json before forwarding upstream — non-parseable JSON is caught here so the caller gets a clear diagnostic instead of an opaque upstream Pydantic validation error. Cockpit-followup-6 (S.5) defense-in-depth Layer 1.

upstream_bad_request

Upstream returned a 4xx (often 400 BadRequest) — typically caused by malformed payloads pre-flight didn't catch (corrupt base64, schema-validation error, content moderation flag). rust-api propagates the upstream status-code 1:1 instead of opaque 502, so the caller can distinguish "client-side-fixable" from "upstream-outage".

Not found (404)

CodeCauseFix
conversation_not_found/c1 — conversation doesn't exist or belongs to another user_idOmit conversation_id to start fresh; or list with GET /c1/conversations
route_not_foundPath/method combination unknownCheck /openapi.json

conversation_not_found

The supplied conversation_id was never persisted (or was deleted, or belongs to a different user_id).

route_not_found

Path/method combination does not exist on this server.

Upstream (5xx)

CodeStatusCause
upstream_error502An upstream provider answered non-2xx — body inlined for debugging
upstream_timeout504Hit HTTP_TOTAL_TIMEOUT_SECS — most often gpt-image (100-180 s)
upstream_unavailable503TCP/TLS to gateway/local-inference failed — check /readyz and docker ps

upstream_error

A non-recoverable upstream 5xx. The server includes the (sanitised) upstream message inline in message. Common causes: model rejected oversize prompt, content moderation flag, provider-side outage.

upstream_timeout

Hit HTTP_TOTAL_TIMEOUT_SECS (default 600 s). Most often a slow image- generation model (gpt-image regularly 100-180 s). Increase your client timeout; check constraints.typical_response_seconds per model in /v1/info. For gpt-image, set client timeout ≥ 240 s.

upstream_unavailable

Could not establish a TCP/TLS connection to the routing gateway or local inference backend. Check /readyz and docker ps / docker logs.

Internal (500)

CodeCauseFix
internal_errorServer-side bug or panicRetry; check server logs with x-request-id
storage_errorSQLite read/write failedServer-side: chown -R 65532:65532 data/sqlite && docker restart dgx-rust-api

Recommended client pattern

import httpx

def call_gateway(payload: dict) -> dict:
    r = httpx.post(
        "https://dgx.spass.fun/v1/chat/completions",
        headers={"Authorization": f"Bearer {BEARER}"},
        json=payload,
        timeout=240,  # cover gpt-image worst case
    )
    if r.is_error:
        body = r.json().get("error", {})
        code = body.get("code", "unknown")
        if code == "rate_limit_exceeded":
            time.sleep(2); return call_gateway(payload)
        if code == "image_url_not_supported":
            # rewrite image_url to base64 and retry
            ...
        raise GatewayError(code, body.get("message"), body.get("param"))
    # honour silent adjustments
    if applied := r.headers.get("x-rust-api-applied"):
        log.info("server floored: %s", applied)
    return r.json()