DGX LLM Chat Gateway

Response-Header Reference

Every /v1/chat/completions response carries a set of SPASS-* headers that expose stack-internal state in a machine-friendly way. They follow the convention from ADR 0006: proper-cased on the wire (Spass-Augment-Applied), case-insensitive per RFC 9110, no X--prefix per RFC 6648, structured-field-values per RFC 8941.

Two categories: request-side (what the caller can send) and response-side (what the server emits).


Request-side headers

HeaderTypePurpose
SPASS-AugmentRFC 8941 dictionaryCaller-controlled server-side augmentation. Format: system-prompt=off, server-tools=off, memory=off. Each key is default (= aktive injection) or off. Default-Verhalten ohne header: alle drei aktiv.
SPASS-Tenant-IdstringMulti-tenant override (only valid for unbound tokens).
SPASS-Scope-IdstringMemory + system-prompt scope-binding.
SPASS-User-IdstringMemory + system-prompt user-binding.
SPASS-Stt-Modeenum chunked | continuousSTT-File-Upload mode (ADR 0005 v2). Only on /v1/audio/transcriptions.
SPASS-Request-IdUUID v4Caller-supplied correlation id. Server echoes it back in the response. Dual-accept with X-Request-Id.

Response-side headers

Request-correlation

HeaderFormatNotes
SPASS-Request-IdUUID v4Internal SPASS form. Always set. Cross-Layer-Tracing: propagiert sich als parent_request_id in audit-events von /a1-sub-calls und ist in jedem audit.jsonl-record als top-level Feld. Caller können den Header echo'en (Bearer-trace-Korrelation).
X-Request-IdUUID v4RFC-de-facto compat alias, same value as SPASS-Request-Id.

Caller-Tip (Cut 2.23c+): Wenn ihr einen X-Request-Id-Header zum Server schickt, übernimmt der Server diesen Wert für audit.jsonl — euer eigener Trace-Identifier ist dann in unseren Logs für post-mortem-Korrelation findbar. Bei 5xx-Errors wird der request_id im dgx_code-Feld der Error-Response mitgeliefert (siehe errors.md).

Augmentation visibility (ADR 0006)

HeaderFormatNotes
SPASS-Augment-AppliedRFC 8941 dictionarysystem-prompt=Nitems, server-tools=<comma-list-or-off>, memory=Nitems, stream-usage=injected (when F.1 default-injected include_usage). Lets the caller audit what the server actually did.
SPASS-Appliedcomma-listSilent adjustments applied to the request (e.g. max_tokens_floored=200, response_format_stripped=empty).
SPASS-Tools-Executedcomma-list name:iter,…Server-side tool-loop trace: which tools fired in which iteration.
SPASS-Stt-ModelstringCut 2.50 — effektiv genutztes STT-Modell auf /v1/audio/transcriptions (resolved Slug, nie auto). Nur dgx-intern/Debug — der nachgelagerte Caller-Proxy verwirft Upstream-Response-Header. Für convert=1 ist das JSON-Body-Feld model maßgeblich.

Resolution + routing (ADR 0007 + 0008)

HeaderValuesNotes
SPASS-Resolved-Modelcanonical aliasNon-streaming: zuverlässig. Streaming: best-effort.
SPASS-Resolved-Backendlocal-fp4 | local-fp8 | local-bf16 | cloud-1 | cloud-2 | cloud-3Generic backend-slot, ADR 0006 v2 compliant — keine implementation-Names.
SPASS-Resolved-Reasonprimary-up | primary-down-fallback | quant-pin-explicit | cloud-pin-explicit | hardware-pin-explicit | legacy-alias | unknownErklärt warum der resolved-backend der ist der er ist. Cockpit-followup-7 wishlist.
SPASS-Fallback-Usedtrue | false | unknownAus actual != primary abgeleitet. unknown bei streaming oder unbekanntem Modell.
SPASS-Cache-Hittrue | falseStack-cache (Redis) hit.

Cost-Pipeline V2 (ADR 0010)

HeaderFormatNotes
SPASS-Cost-Eur0.01 (2 decimals, ceil_to_cent)Final EUR mit allen markups. Tenant-billing-authoritative.
SPASS-Cost-Usd0.01 (2 decimals, round_to_cent)USD = EUR ÷ ECB-rate (ohne markups). Display-only.
SPASS-Cost-Availabletrue | falseOb ein cost-Wert ableitbar war. false nur bei unknown source.
SPASS-Cost-Sourcezero | free | upstream | unknownWoher der Wert kommt.
SPASS-Cost-Exchange-Rate0.9300 (4 decimals)ECB-rate (ohne markup) für caller-Verifikation.
SPASS-Cost-Exchange-Rate-Sourceecb-YYYY-MM-DD | fallback-30d-max-... | fallback-hardcodedLookup-Hierarchie-Indikator.
SPASS-Cost-Sub-Callsinteger (≥0)/a1 only. How many internal /v1 sub-calls contributed to the aggregated cost. ≥1 for normal completions; can be larger for tool-loops.

On /a1/agents/<name>/chat and /a1/agents/<name>/sessions/<sid>/messages, all of the above are summed across the rig agent's internal /v1 sub-calls (Cut 2.7). The dominant Source wins (Upstream > Free > Zero > Unknown); Available is false if ANY sub-call was Unknown.

Recursion + audit-correlation (Cut 2.3)

HeaderFormatNotes
SPASS-Caller-Depthinteger (in-only)Optional incoming header. The /a1 handler refuses requests at ≥ 3 (recursion_depth_exceeded). The handler propagates incoming + 1 on its rig sub-call.
SPASS-Parent-Request-IdUUID v4 (in-only)Set by /a1 on the rig sub-call so /v1's audit-event carries parent_request_id = outer-rid. Lets jq reconstruct call-trees from audit.jsonl.

/c1/chat only

HeaderFormatNotes
SPASS-Conversation-IdUUID v4Persistent conversation ID. Caller submits it on follow-up turns to load history.
spass-system-prompt-ignoredalready_present (Cut 2.33, CR-0003)Set when the request carries a system-prompt field (system_prompt / system_prompt_ref / additional_system_prompt) but the conversation already has a role=system-row from an earlier turn. Server ignores the new value; this header is the explicit signal so the caller can debug why their late system-prompt change didn't take effect. Use POST /c1/conversations/{id}/system-prompt to append a new system-row deliberately.

Quota signaling (Cut 2.32, CR-0005)

HeaderFormatNotes
Retry-Afterinteger (seconds)Set on HTTP 429 responses with dgx_code: "openrouter_daily_quota_exhausted". Value = seconds until next 00:00 UTC. Standard RFC 9110 retry-hint.

Caller-pattern: cost-tracking

function requestCostEur(headers: Headers): number | null {
  if (headers.get("spass-cost-available") !== "true") return null;
  return parseFloat(headers.get("spass-cost-eur") ?? "0");
}

Local-zero responses set Spass-Cost-Eur: 0.00 + Spass-Cost-Source: zero explicitly so the caller doesn't need to special-case missing headers.

Caller-pattern: routing-diagnose

const backend = headers.get("spass-resolved-backend");
const reason = headers.get("spass-resolved-reason");
if (reason === "primary-down-fallback") {
  // T1 unprefixed alias fell to cloud — log + maybe retry differently
}

Spass-Resolved-Reason removes the need to compare Spass-Resolved-Backend to the catalog manually — the server already did that.


Cross-references