Tool calling
All chat-capable models in the catalog (everything except the image-only ones)
support OpenAI-style function calling. Pass tools and the model returns a
tool_calls array instead of a plain content when it decides to call.
Cut 2.23c (ADR 0016) —
SPASS-User-Id-Header is mandatory on user-scoped endpoints. Add-H "SPASS-User-Id: $USER_ID"to every curl-call below in addition to the bearer. Body- and query-user_idwould be rejected with HTTP 400invalid_field.
Single call
curl -s https://dgx.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"messages": [{"role": "user", "content": "Wie ist das Wetter in Berlin?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}],
"max_tokens": 200
}'
Response (only the relevant field):
{
"choices": [{
"message": {
"role": "assistant",
"tool_calls": [{
"id": "call_abc",
"type": "function",
"function": { "name": "get_weather", "arguments": "{\"city\":\"Berlin\"}" }
}]
},
"finish_reason": "tool_calls"
}]
}
Multi-turn loop (call → execute → respond)
# Turn 1: model emits tool_call
RESP1=$(curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d "$REQ1" https://dgx.spass.fun/v1/chat/completions)
# Extract tool_call_id and arguments, run your function
CALL_ID=$(echo "$RESP1" | jq -r '.choices[0].message.tool_calls[0].id')
ARGS=$(echo "$RESP1" | jq -r '.choices[0].message.tool_calls[0].function.arguments')
RESULT=$(./my_get_weather.sh "$ARGS") # your code
# Turn 2: send back the tool result
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d "{
\"model\": \"llama-4-scout\",
\"messages\": [
{\"role\": \"user\", \"content\": \"Wie ist das Wetter in Berlin?\"},
$(echo "$RESP1" | jq '.choices[0].message'),
{\"role\": \"tool\", \"tool_call_id\": \"$CALL_ID\", \"content\": $RESULT}
],
\"max_tokens\": 200
}" \
https://dgx.spass.fun/v1/chat/completions
The second turn returns a regular content with the model's natural-language
answer based on the tool output.
Tool calling on /c1/chat
/c1 carries the same tools / tool_choice fields as /v1. The benefit
is that intermediate tool-call/tool-result messages are persisted in SQLite —
on a follow-up turn you don't have to resend the whole history.
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"message": "Wie ist das Wetter in Berlin?",
"tools": [ { "type": "function", "function": { "name": "get_weather", ... } } ],
"tool_choice": "auto",
"response_format": {"type": "text"},
"stream": false,
"max_tokens": 200
}' \
https://dgx.spass.fun/c1/chat
Models with verified tool support
These were tested end-to-end on 2026-04-29:
| Model | Latency (cold) | Notes |
|---|---|---|
llama-4-scout (local) | 1 s | Strong at structured arguments |
mistral-small-4 | 1 s | |
qwen3-vl-30b-{thinking,instruct} | 1-7 s | Thinking variant slower, often higher quality |
gemma-4-31b | 1 s | |
claude-opus-4.7 | 1 s | Best at multi-step planning |
gpt-5.5-pro | 5 s | Reasoning model — max_tokens >= 200 |
gemini-3.1-pro | 2 s | Reasoning — max_tokens >= 200 |
grok-4.20 | 2 s |
The image-only aliases (nano-banana, gpt-image, image-gen) do not
support tools; their constraints.tools is false.
Tool-Use deaktivieren oder einschränken (Per-Request)
Drei orthogonale Mechanismen — Frontend kann pro Request entscheiden ob Tools genutzt werden. Volle Doku + Kombinations-Matrix + curl-Beispiele unter /docs/tools Section "Tool-Use steuern". Quick-Ref:
// Variante 1: Privacy-Mode via SPASS-Augment Header
SPASS-Augment: server-tools=off
// Variante 2: OpenAI-spec body field
{"tools": [], "tool_choice": "none"}
// Variante 3 (empfohlen für striktes "no tools"): kombiniert
SPASS-Augment: server-tools=off
{"tools": [], "tool_choice": "none"}
Cut 2.36 (CR-0007) ist Defense-in-Depth: wenn Llama-FP8 trotz Deaktivierung einen halluzinierten Tool-Call im content emittiert, wird der JSON-Block automatisch stripped (siehe /docs/changelog Cut 2.36).
Server-side Tool-Loop Resilience (Cut 2.32, CR-0001)
Bei /v1/chat/completions und /c1/chat führt der Server bekannte Stack-Tools intern aus und macht bis zu MAX_TOOL_ITERATIONS = 10 Hin-und-Her zwischen Tool-Call → Tool-Execution → LLM-Reflexion (Cut 2.32 erhöhte das von 5, Anthropic-Industrie-Standard "start with 10-30"). Wenn ein Multi-Tool-Use-Case die 10 Iterationen aufbraucht ODER das LLM dieselbe name+args-Signatur 2x hintereinander emittiert (Anti-Loop), macht der Server automatisch einen Synthesis-LLM-Call mit tools: [] + den bisher collected tool-results im prompt — das erzwingt eine narrative Antwort statt eines hängenden tool_calls-Pfads.
Caller sieht:
finish_reason: "stop"(nicht"tool_calls").contentist NIE leer.dgx_codeim body:"tool_loop_max_iterations"oder"tool_loop_anti_loop_synthesised"(sieheerrors.md).- Im Stream-Mode: zusätzliches named SSE-Event
event: spass.tool-capbzw.event: spass.tool-anti-loopmit Payload{code, iterations, synth_called}vor dem finalen content (sieheexamples-streaming.md).
Plus: Cut 2.32 erweitert tool_normalize.rs (intern) — multi-document-JSON-Parser (Llama-FP8 emittiert manchmal newline-separierte JSON-objects statt array), partial-match-policy (mixed known+unknown → nur known normalisiert), Schema-Toleranz (akzeptiert name|function|tool keys + parameters|arguments|args|input für payload). Für Caller transparent.