DGX LLM Chat Gateway

Tool calling

All chat-capable models in the catalog (everything except the image-only ones) support OpenAI-style function calling. Pass tools and the model returns a tool_calls array instead of a plain content when it decides to call.

Single call

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "messages": [{"role": "user", "content": "Wie ist das Wetter in Berlin?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }],
    "max_tokens": 200
  }'

Response (only the relevant field):

{
  "choices": [{
    "message": {
      "role": "assistant",
      "tool_calls": [{
        "id": "call_abc",
        "type": "function",
        "function": { "name": "get_weather", "arguments": "{\"city\":\"Berlin\"}" }
      }]
    },
    "finish_reason": "tool_calls"
  }]
}

Multi-turn loop (call → execute → respond)

# Turn 1: model emits tool_call
RESP1=$(curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d "$REQ1" https://dgx-spark-4236.spass.fun/v1/chat/completions)

# Extract tool_call_id and arguments, run your function
CALL_ID=$(echo "$RESP1" | jq -r '.choices[0].message.tool_calls[0].id')
ARGS=$(echo "$RESP1"   | jq -r '.choices[0].message.tool_calls[0].function.arguments')
RESULT=$(./my_get_weather.sh "$ARGS")  # your code

# Turn 2: send back the tool result
curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d "{
    \"model\": \"llama-4-scout\",
    \"messages\": [
      {\"role\": \"user\", \"content\": \"Wie ist das Wetter in Berlin?\"},
      $(echo "$RESP1" | jq '.choices[0].message'),
      {\"role\": \"tool\", \"tool_call_id\": \"$CALL_ID\", \"content\": $RESULT}
    ],
    \"max_tokens\": 200
  }" \
  https://dgx-spark-4236.spass.fun/v1/chat/completions

The second turn returns a regular content with the model's natural-language answer based on the tool output.

Tool calling on /c1/chat

/c1 carries the same tools / tool_choice fields as /v1. The benefit is that intermediate tool-call/tool-result messages are persisted in SQLite — on a follow-up turn you don't have to resend the whole history.

curl -s -H "Authorization: Bearer $BEARER" -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "message": "Wie ist das Wetter in Berlin?",
    "tools": [ { "type": "function", "function": { "name": "get_weather", ... } } ],
    "tool_choice": "auto",
    "response_format": {"type": "text"},
    "stream": false,
    "max_tokens": 200
  }' \
  https://dgx-spark-4236.spass.fun/c1/chat

Models with verified tool support

These were tested end-to-end on 2026-04-29:

ModelLatency (cold)Notes
llama-4-scout (local)1 sStrong at structured arguments
mistral-small-41 s
qwen3-vl-30b-{thinking,instruct}1-7 sThinking variant slower, often higher quality
gemma-4-31b1 s
claude-opus-4.71 sBest at multi-step planning
gpt-5.5-pro5 sReasoning model — max_tokens >= 200
gemini-3.1-pro2 sReasoning — max_tokens >= 200
grok-4.202 s

The image-only aliases (nano-banana, gpt-image, image-gen) do not support tools; their constraints.tools is false.