DGX LLM Chat Gateway

Vision

All vision-capable models accept inline base64-encoded images via the OpenAI-style image_url content-part. Server-side URL fetching is not supported — see the Error catalog for image_url_not_supported for the reason.

Encode-and-send recipe

B64=$(base64 -w 0 photo.jpg)

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"qwen3-vl-30b-instruct\",
    \"messages\": [{
      \"role\": \"user\",
      \"content\": [
        {\"type\": \"text\", \"text\": \"Describe this image in one sentence.\"},
        {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
      ]
    }],
    \"max_tokens\": 200
  }"

Notes:

What gets rejected

A https://... URL gets you a 400 with code: image_url_not_supported:

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.7",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What logo?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/logo.png"}}
      ]
    }]
  }'
{
  "error": {
    "type": "invalid_request_error",
    "code": "image_url_not_supported",
    "message": "Cloud providers don't fetch URLs server-side. Encode the image as a base64 data URI ...",
    "param": "messages[0].content[1].image_url.url"
  }
}

The param field uses JSON-Pointer syntax — clients can use this to highlight the exact field in their UI.

Vision-capable models

AliasNotes
llama-4-scout (local)Strong on screenshots, charts
mistral-small-4Good general-purpose vision
qwen3-vl-30b-instructFast, multi-image, good for Asian-language captions
qwen3-vl-30b-thinkingThinking variant — slower but better at complex visual reasoning
gemma-4-31bMulti-image, 256 K context
claude-opus-4.7Best at nuanced descriptions
gemini-3.1-proOnly model that also accepts audio + video
grok-4.202 M token context
gpt-5.5-proReasoning over images — max_tokens >= 200
nano-banana, gpt-image, image-genVision input + image output

On /c1/chatimages shortcut

/c1/chat accepts a top-level images array as a shortcut for plain-text prompts with one or more images. The server rebuilds the OpenAI content-parts array internally:

B64=$(base64 -w 0 photo.jpg)

curl -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"qwen3-vl-30b-instruct\",
    \"message\": \"Describe this image\",
    \"images\": [{\"url\": \"data:image/jpeg;base64,$B64\"}],
    \"tools\": [],
    \"tool_choice\": \"auto\",
    \"response_format\": {\"type\": \"text\"},
    \"stream\": false,
    \"max_tokens\": 200
  }"

If you need fine control (multiple text + image parts in one message), use the content-parts array form on message directly:

"message": [
  {"type": "text", "text": "Compare these two photos"},
  {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
  {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]