Image generation
Three aliases delivered through the chat-completions endpoint:
| Alias | Underlying model | Cold latency | Cost (rough) |
|---|---|---|---|
nano-banana | google/gemini-3.1-flash-image-preview | 12-15 s | ~$3/M tokens |
gpt-image | openai/gpt-5.4-image-2 | 100-180 s | ~$15/M tokens |
image-gen | composite (banana → gpt-image) | 12-180 s | mixed |
Output is base64-encoded image data inside the chat completion —
specifically choices[0].message.images[0].image_url.url, prefixed with
data:image/jpeg;base64,.
Generate one image
curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d '{
"model": "nano-banana",
"messages": [{
"role": "user",
"content": "Generate a photo of a small green frog wearing a tiny crown, photorealistic"
}],
"stream": false
}' \
| jq -r '.choices[0].message.images[0].image_url.url' \
| sed 's/^data:image\/[a-z]*;base64,//' \
| base64 -d > frog.jpg
Important details:
- Set client timeout ≥ 240 s —
gpt-imageregularly takes 150 s. Default curl/httpx timeouts are too short. - The response is a chat completion, not the OpenAI
/images/generationsshape. The image bytes are insidemessage.images[0].image_url.url. text/plaincompanion text (a description) is inmessage.content. You can drop it if you only want the image.
Compare aliases
PROMPT="A vintage scientific illustration of a hummingbird"
for ALIAS in nano-banana gpt-image image-gen; do
echo "=== $ALIAS ==="
T0=$(date +%s)
curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d "{\"model\": \"$ALIAS\", \"messages\": [{\"role\": \"user\", \"content\": \"$PROMPT\"}], \"stream\": false}" \
| jq -r '.choices[0].message.images[0].image_url.url' \
| tee >(wc -c) \
| sed 's/^data:image\/[a-z]*;base64,//' \
| base64 -d > "${ALIAS}.jpg"
echo " time: $(($(date +%s) - T0)) s"
done
When to pick which
nano-banana— default for fast iteration, photorealism, simple composition. Good price/quality.gpt-image— when you need text rendered correctly inside the image (signs, labels, UI mockups), complex multi-element compositions, or detailed instruction following.image-gen— runsnano-bananafirst, falls back togpt-imageonly on hard failure. Use when you don't care which provider answered.
Image input + image output
Both image-gen aliases also accept image input — useful for variation / edit workflows:
B64=$(base64 -w 0 source.jpg)
curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
-H "Authorization: Bearer $BEARER" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"nano-banana\",
\"messages\": [{
\"role\": \"user\",
\"content\": [
{\"type\": \"text\", \"text\": \"Make this look like an oil painting\"},
{\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
]
}],
\"stream\": false
}"
Limitations
- No tool calling —
constraints.toolsisfalsefor all three aliases. - No streaming — image content is delivered atomically, not as SSE chunks.
- Body size — base64 encoding inflates by ~33 %. Outputs are typically
500 KB to 2 MB; combined with input images you may hit
MAX_BODY_BYTES(default 32 MB) on large multi-image requests. Resize aggressively before encoding.