
Vidu Q3 Mix reference-to-video model. Generates a video with your characters from 1-7 reference images using mixed-style synthesis. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 720p/1080p only; `duration` range 1-16s.

GPT Image 2 is the state-of-the-art image generation model for fast, high-quality image generation. Uses the size parameter for output dimensions including 2K (2048x2048, 2048x1152) and 4K (3840x2160, 2160x3840). Default quality is low when omitted. Does not support transparent backgrounds.

Nano Banana 2 text-to-image model. Supports 14 aspect ratios and 512-4K resolution.

Dreamina Seedance 2.0 image-to-video model. Generate video from a text prompt, optionally conditioned on a first or last frame, up to nine reference images, and up to three reference audio tracks. Outputs 480p, 720p, or 1080p with configurable aspect ratio.

HappyHorse text-to-video model generates physically realistic and smoothly animated video content from text prompts. The model focuses on physical realism and motion fluidity, supporting various resolution and aspect ratio combinations with 3-15 seconds duration.

Hailuo 2.3 generates high-quality videos from text with exceptional instruction following and state-of-the-art extreme physics simulation.

Hailuo 02 masters text-to-video generation with exceptional instruction following and sets a new standard in visual realism via extreme physics.

Google Imagen 4.0 standard text-to-image model. High-quality photorealistic output. Supports batch generation (up to 4), person control, and up to 2K.

Google Veo 3.0 stable text-to-video model. Supports `prompt`, `negativePrompt`, `aspectRatio`, `durationSeconds`, `resolution` (up to 1080p), and `personGeneration`.

ByteDance Seedream 5.0 Lite text-to-image model with 2K/3K custom resolutions and configurable output format.

Z-Image Turbo is a lightweight text-to-image model that quickly generates images with Chinese and English text rendering support. It always outputs 1 PNG image per request.

Faster variant of Dreamina Seedance 2.0 image-to-video. Accepts the same multimodal inputs as Seedance 2.0 I2V—text prompt plus optional reference images and audio—with lower latency. Resolution limited to 480p/720p.

Nano Banana Pro image generation model with higher quality output. Supports aspect ratio and image size (1K/2K/4K resolution).

Qwen-Image 2.0 Pro is the most capable model in the Qwen-Image series, with superior text rendering, realistic textures, and semantic adherence. Supports larger resolutions and batch generation of 1-6 images.