Vision
All vision-capable models accept inline base64-encoded images via the
OpenAI-style image_url content-part. Server-side URL fetching is not
supported — see the Error catalog for image_url_not_supported for
the reason.
Encode-and-send recipe
B64=$(base64 -w 0 photo.jpg)
curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"qwen3-vl-30b-instruct\",
\"messages\": [{
\"role\": \"user\",
\"content\": [
{\"type\": \"text\", \"text\": \"Describe this image in one sentence.\"},
{\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
]
}],
\"max_tokens\": 200
}"
Notes:
base64 -w 0produces a single-line encoding — multi-line encoded data trips MIME parsers.- The MIME prefix is required:
data:image/jpeg;base64,ordata:image/png;base64,etc. Missing it gets youcode: image_decode_error. - Use
-d @body.jsonwith a file when the payload exceeds shell-arg limits.
What gets rejected
A https://... URL gets you a 400 with code: image_url_not_supported:
curl -s https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4.7",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What logo?"},
{"type": "image_url", "image_url": {"url": "https://example.com/logo.png"}}
]
}]
}'
{
"error": {
"type": "invalid_request_error",
"code": "image_url_not_supported",
"message": "Cloud providers don't fetch URLs server-side. Encode the image as a base64 data URI ...",
"param": "messages[0].content[1].image_url.url"
}
}
The param field uses JSON-Pointer syntax — clients can use this to
highlight the exact field in their UI.
Vision-capable models
| Alias | Notes |
|---|---|
llama-4-scout (local) | Strong on screenshots, charts |
mistral-small-4 | Good general-purpose vision |
qwen3-vl-30b-instruct | Fast, multi-image, good for Asian-language captions |
qwen3-vl-30b-thinking | Thinking variant — slower but better at complex visual reasoning |
gemma-4-31b | Multi-image, 256 K context |
claude-opus-4.7 | Best at nuanced descriptions |
gemini-3.1-pro | Only model that also accepts audio + video |
grok-4.20 | 2 M token context |
gpt-5.5-pro | Reasoning over images — max_tokens >= 200 |
nano-banana, gpt-image, image-gen | Vision input + image output |
On /c1/chat — images shortcut
/c1/chat accepts a top-level images array as a shortcut for plain-text
prompts with one or more images. The server rebuilds the OpenAI content-parts
array internally:
B64=$(base64 -w 0 photo.jpg)
curl -s https://dgx-spark-4236.spass.fun/c1/chat \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"qwen3-vl-30b-instruct\",
\"message\": \"Describe this image\",
\"images\": [{\"url\": \"data:image/jpeg;base64,$B64\"}],
\"tools\": [],
\"tool_choice\": \"auto\",
\"response_format\": {\"type\": \"text\"},
\"stream\": false,
\"max_tokens\": 200
}"
If you need fine control (multiple text + image parts in one message), use the
content-parts array form on message directly:
"message": [
{"type": "text", "text": "Compare these two photos"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]