DGX LLM Chat Gateway

Tool calling

All chat-capable models in the catalog (everything except the image-only ones) support OpenAI-style function calling. Pass tools and the model returns a tool_calls array instead of a plain content when it decides to call.

Cut 2.23c (ADR 0016) — SPASS-User-Id-Header is mandatory on user-scoped endpoints. Add -H "SPASS-User-Id: $USER_ID" to every curl-call below in addition to the bearer. Body- and query-user_id would be rejected with HTTP 400 invalid_field.

Single call

curl -s https://dgx.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "messages": [{"role": "user", "content": "Wie ist das Wetter in Berlin?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }],
    "max_tokens": 200
  }'

Response (only the relevant field):

{
  "choices": [{
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "id": "call_abc",
        "type": "function",
        "function": { "name": "get_weather", "arguments": "{\"city\":\"Berlin\"}" }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Multi-turn loop (call → execute → respond)

# Turn 1: model emits tool_call
RESP1=$(curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d "$REQ1" https://dgx.spass.fun/v1/chat/completions)

# Extract tool_call_id and arguments, run your function
CALL_ID=$(echo "$RESP1" | jq -r '.choices[0].message.tool_calls[0].id')
ARGS=$(echo "$RESP1"   | jq -r '.choices[0].message.tool_calls[0].function.arguments')
RESULT=$(./my_get_weather.sh "$ARGS")  # your code

# Turn 2: send back the tool result
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama-4-scout\",
    \"messages\": [
      {\"role\": \"user\", \"content\": \"Wie ist das Wetter in Berlin?\"},
      $(echo "$RESP1" | jq '.choices[0].message'),
      {\"role\": \"tool\", \"tool_call_id\": \"$CALL_ID\", \"content\": $RESULT}
    ],
    \"max_tokens\": 200
  }" \
  https://dgx.spass.fun/v1/chat/completions

The second turn returns a regular content with the model's natural-language answer based on the tool output.

Tool calling on /c1/chat

/c1 carries the same tools / tool_choice fields as /v1. The benefit is that intermediate tool-call/tool-result messages are persisted in SQLite — on a follow-up turn you don't have to resend the whole history.

curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "message": "Wie ist das Wetter in Berlin?",
    "tools": [ { "type": "function", "function": { "name": "get_weather", ... } } ],
    "tool_choice": "auto",
    "response_format": {"type": "text"},
    "stream": false,
    "max_tokens": 200
  }' \
  https://dgx.spass.fun/c1/chat

Models with verified tool support

These were tested end-to-end on 2026-04-29:

ModelLatency (cold)Notes
llama-4-scout (local)1 sStrong at structured arguments
mistral-small-41 s
qwen3-vl-30b-{thinking,instruct}1-7 sThinking variant slower, often higher quality
gemma-4-31b1 s
claude-opus-4.71 sBest at multi-step planning
gpt-5.5-pro5 sReasoning model — max_tokens >= 200
gemini-3.1-pro2 sReasoning — max_tokens >= 200
grok-4.202 s

The image-only aliases (nano-banana, gpt-image, image-gen) do not support tools; their constraints.tools is false.

Tool-Use deaktivieren oder einschränken (Per-Request)

Drei orthogonale Mechanismen — Frontend kann pro Request entscheiden ob Tools genutzt werden. Volle Doku + Kombinations-Matrix + curl-Beispiele unter /docs/tools Section "Tool-Use steuern". Quick-Ref:

// Variante 1: Privacy-Mode via SPASS-Augment Header
SPASS-Augment: server-tools=off

// Variante 2: OpenAI-spec body field
{"tools": [], "tool_choice": "none"}

// Variante 3 (empfohlen für striktes "no tools"): kombiniert
SPASS-Augment: server-tools=off
{"tools": [], "tool_choice": "none"}

Cut 2.36 (CR-0007) ist Defense-in-Depth: wenn Llama-FP8 trotz Deaktivierung einen halluzinierten Tool-Call im content emittiert, wird der JSON-Block automatisch stripped (siehe /docs/changelog Cut 2.36).

Server-side Tool-Loop Resilience (Cut 2.32, CR-0001)

Bei /v1/chat/completions und /c1/chat führt der Server bekannte Stack-Tools intern aus und macht bis zu MAX_TOOL_ITERATIONS = 10 Hin-und-Her zwischen Tool-Call → Tool-Execution → LLM-Reflexion (Cut 2.32 erhöhte das von 5, Anthropic-Industrie-Standard "start with 10-30"). Wenn ein Multi-Tool-Use-Case die 10 Iterationen aufbraucht ODER das LLM dieselbe name+args-Signatur 2x hintereinander emittiert (Anti-Loop), macht der Server automatisch einen Synthesis-LLM-Call mit tools: [] + den bisher collected tool-results im prompt — das erzwingt eine narrative Antwort statt eines hängenden tool_calls-Pfads.

Caller sieht:

Plus: Cut 2.32 erweitert tool_normalize.rs (intern) — multi-document-JSON-Parser (Llama-FP8 emittiert manchmal newline-separierte JSON-objects statt array), partial-match-policy (mixed known+unknown → nur known normalisiert), Schema-Toleranz (akzeptiert name|function|tool keys + parameters|arguments|args|input für payload). Für Caller transparent.