Image generation

Three aliases delivered through the chat-completions endpoint:

Alias	Underlying model	Cold latency	Cost (rough)
`nano-banana`	google/gemini-3.1-flash-image-preview	12-15 s	~$3/M tokens
`gpt-image`	openai/gpt-5.4-image-2	100-180 s	~$15/M tokens
`image-gen`	composite (banana → gpt-image)	12-180 s	mixed

Output is base64-encoded image data inside the chat completion — specifically choices[0].message.images[0].image_url.url, prefixed with data:image/jpeg;base64,.

Generate one image

curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nano-banana",
    "messages": [{
      "role": "user",
      "content": "Generate a photo of a small green frog wearing a tiny crown, photorealistic"
    }],
    "stream": false
  }' \
  | jq -r '.choices[0].message.images[0].image_url.url' \
  | sed 's/^data:image\/[a-z]*;base64,//' \
  | base64 -d > frog.jpg

Important details:

Set client timeout ≥ 240 s — gpt-image regularly takes 150 s. Default curl/httpx timeouts are too short.
The response is a chat completion, not the OpenAI /images/generations shape. The image bytes are inside message.images[0].image_url.url.
text/plain companion text (a description) is in message.content. You can drop it if you only want the image.

Compare aliases

PROMPT="A vintage scientific illustration of a hummingbird"

for ALIAS in nano-banana gpt-image image-gen; do
  echo "=== $ALIAS ==="
  T0=$(date +%s)
  curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
    -H "Authorization: Bearer $BEARER" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"$ALIAS\", \"messages\": [{\"role\": \"user\", \"content\": \"$PROMPT\"}], \"stream\": false}" \
    | jq -r '.choices[0].message.images[0].image_url.url' \
    | tee >(wc -c) \
    | sed 's/^data:image\/[a-z]*;base64,//' \
    | base64 -d > "${ALIAS}.jpg"
  echo "  time: $(($(date +%s) - T0)) s"
done

When to pick which

nano-banana — default for fast iteration, photorealism, simple composition. Good price/quality.
gpt-image — when you need text rendered correctly inside the image (signs, labels, UI mockups), complex multi-element compositions, or detailed instruction following.
image-gen — runs nano-banana first, falls back to gpt-image only on hard failure. Use when you don't care which provider answered.

Image input + image output

Both image-gen aliases also accept image input — useful for variation / edit workflows:

B64=$(base64 -w 0 source.jpg)

curl -s --max-time 240 https://dgx-spark-4236.spass.fun/v1/chat/completions \
  -H "Authorization: Bearer $BEARER" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"nano-banana\",
    \"messages\": [{
      \"role\": \"user\",
      \"content\": [
        {\"type\": \"text\", \"text\": \"Make this look like an oil painting\"},
        {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:image/jpeg;base64,$B64\"}}
      ]
    }],
    \"stream\": false
  }"

Limitations

No tool calling — constraints.tools is false for all three aliases.
No streaming — image content is delivered atomically, not as SSE chunks.
Body size — base64 encoding inflates by ~33 %. Outputs are typically 500 KB to 2 MB; combined with input images you may hit MAX_BODY_BYTES (default 32 MB) on large multi-image requests. Resize aggressively before encoding.