Vision

All vision-capable models accept inline base64-encoded images via the OpenAI-style image_url content-part. Server-side URL fetching is not supported — see the Error catalog for image_url_not_supported for the reason.

Encode-and-send recipe

B64=$(base64 -w 0 photo.jpg)

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"qwen3-vl-30b-instruct\",
    \"messages\": [{
      \"role\": \"user\",
      \"content\": [
        {\"type\": \"text\", \"text\": \"Describe this image in one sentence.\"},
        {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
      ]
    }],
    \"max_tokens\": 200
  }"

Notes:

base64 -w 0 produces a single-line encoding — multi-line encoded data trips MIME parsers.
The MIME prefix is required: data:image/jpeg;base64, or data:image/png;base64, etc. Missing it gets you code: image_decode_error.
Use -d @body.json with a file when the payload exceeds shell-arg limits.

What gets rejected

A https://... URL gets you a 400 with code: image_url_not_supported:

curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.7",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What logo?"},
        {"type": "image_url", "image_url": {"url": "https://example.com/logo.png"}}
      ]
    }]
  }'

{
  "error": {
    "type": "invalid_request_error",
    "code": "image_url_not_supported",
    "message": "Cloud providers don't fetch URLs server-side. Encode the image as a base64 data URI ...",
    "param": "messages[0].content[1].image_url.url"
  }
}

The param field uses JSON-Pointer syntax — clients can use this to highlight the exact field in their UI.

Vision-capable models

Alias	Notes
`llama-4-scout` (local)	Strong on screenshots, charts
`mistral-small-4`	Good general-purpose vision
`qwen3-vl-30b-instruct`	Fast, multi-image, good for Asian-language captions
`qwen3-vl-30b-thinking`	Thinking variant — slower but better at complex visual reasoning
`gemma-4-31b`	Multi-image, 256 K context
`claude-opus-4.7`	Best at nuanced descriptions
`gemini-3.1-pro`	Only model that also accepts audio + video
`grok-4.20`	2 M token context
`gpt-5.5-pro`	Reasoning over images — `max_tokens >= 200`
`nano-banana`, `gpt-image`, `image-gen`	Vision input + image output

On `/c1/chat` — `images` shortcut

/c1/chat accepts a top-level images array as a shortcut for plain-text prompts with one or more images. The server rebuilds the OpenAI content-parts array internally:

B64=$(base64 -w 0 photo.jpg)

curl -s https://dgx-spark-4236.spass.fun/c1/chat \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"qwen3-vl-30b-instruct\",
    \"message\": \"Describe this image\",
    \"images\": [{\"url\": \"data:image/jpeg;base64,$B64\"}],
    \"tools\": [],
    \"tool_choice\": \"auto\",
    \"response_format\": {\"type\": \"text\"},
    \"stream\": false,
    \"max_tokens\": 200
  }"

If you need fine control (multiple text + image parts in one message), use the content-parts array form on message directly:

"message": [
  {"type": "text", "text": "Compare these two photos"},
  {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
  {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]

Vision

Encode-and-send recipe

What gets rejected

Vision-capable models

On /c1/chat — images shortcut

On `/c1/chat` — `images` shortcut