精选模型

vidu/viduq3-mix-r2v

Vidu Q3 Mix reference-to-video model. Generates a video with your characters from 1-7 reference images using mixed-style synthesis. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 720p/1080p only; `duration` range 1-16s.

$0.0600~$0.0720/sec

openai/gpt-image-2

text-to-image

GPT Image 2 is the state-of-the-art image generation model for fast, high-quality image generation. Uses the size parameter for output dimensions including 2K (2048x2048, 2048x1152) and 4K (3840x2160, 2160x3840). Default quality is low when omitted. Does not support transparent backgrounds.

$0.0039~$0.4100/img

google/nano-banana-2

text-to-image

Nano Banana 2 text-to-image model. Supports 14 aspect ratios and 512-4K resolution.

$0.0410~$0.1300/img

bytedance/seedance-2.0-i2v

image-to-video

Dreamina Seedance 2.0 image-to-video model. Generate video from a text prompt, optionally conditioned on a first or last frame, up to nine reference images, and up to three reference audio tracks. Outputs 480p, 720p, or 1080p with configurable aspect ratio.

$0.0810~$0.4400/sec

alibaba/happyhorse-1.0-t2v

text-to-video

HappyHorse text-to-video model generates physically realistic and smoothly animated video content from text prompts. The model focuses on physical realism and motion fluidity, supporting various resolution and aspect ratio combinations with 3-15 seconds duration.

$0.1280~$0.2290/sec

minimax/hailuo-2.3-t2v

text-to-video

Hailuo 2.3 generates high-quality videos from text with exceptional instruction following and state-of-the-art extreme physics simulation.

$0.0540~$0.1200/sec

minimax/hailuo-02-t2v

text-to-video

Hailuo 02 masters text-to-video generation with exceptional instruction following and sets a new standard in visual realism via extreme physics.

$0.0540~$0.1200/sec

google/imagen-4.0-generate-001

text-to-image

Google Imagen 4.0 standard text-to-image model. High-quality photorealistic output. Supports batch generation (up to 4), person control, and up to 2K.

$0.0330/img

google/veo-3-t2v

text-to-video

Google Veo 3.0 stable text-to-video model. Supports `prompt`, `negativePrompt`, `aspectRatio`, `durationSeconds`, `resolution` (up to 1080p), and `personGeneration`.

$0.4600/sec

bytedance/seedream-5.0-lite

text-to-image

ByteDance Seedream 5.0 Lite text-to-image model with 2K/3K custom resolutions and configurable output format.

$0.0370/img

alibaba/z-image-turbo

text-to-image

Z-Image Turbo is a lightweight text-to-image model that quickly generates images with Chinese and English text rendering support. It always outputs 1 PNG image per request.

$0.0100~$0.0200/img

bytedance/seedance-2.0-fast-i2v

image-to-video

Faster variant of Dreamina Seedance 2.0 image-to-video. Accepts the same multimodal inputs as Seedance 2.0 I2V—text prompt plus optional reference images and audio—with lower latency. Resolution limited to 480p/720p.

$0.0650~$0.1400/sec

google/nano-banana-pro

text-to-image

Nano Banana Pro image generation model with higher quality output. Supports aspect ratio and image size (1K/2K/4K resolution).

$0.1300~$0.2200/img

alibaba/qwen-image-2.0-pro

text-to-image

Qwen-Image 2.0 Pro is the most capable model in the Qwen-Image series, with superior text rendering, realistic textures, and semantic adherence. Supports larger resolutions and batch generation of 1-6 images.

$0.0650/img