image-to-video
Vidu Q3 Mix R2V

Vidu Q3 Mix reference-to-video model. Generates a video with your characters from 1-7 reference images using mixed-style synthesis. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 720p/1080p only; `duration` range 1-16s.

探索 AI 模型

minimax/hailuo-2.3-t2v
minimax/hailuo-2.3-t2v
text-to-video

Hailuo 2.3 generates high-quality videos from text with exceptional instruction following and state-of-the-art extreme physics simulation.

$0.0540~$0.1200/sec
minimax/hailuo-02-t2v
minimax/hailuo-02-t2v
text-to-video

Hailuo 02 masters text-to-video generation with exceptional instruction following and sets a new standard in visual realism via extreme physics.

$0.0540~$0.1200/sec
google/imagen-4.0-generate-001
google/imagen-4.0-generate-001
text-to-image

Google Imagen 4.0 standard text-to-image model. High-quality photorealistic output. Supports batch generation (up to 4), person control, and up to 2K.

$0.0330/img
google/veo-3-t2v
google/veo-3-t2v
text-to-video

Google Veo 3.0 stable text-to-video model. Supports `prompt`, `negativePrompt`, `aspectRatio`, `durationSeconds`, `resolution` (up to 1080p), and `personGeneration`.

$0.4600/sec
bytedance/seedream-5.0-lite
bytedance/seedream-5.0-lite
text-to-image

ByteDance Seedream 5.0 Lite text-to-image model with 2K/3K custom resolutions and configurable output format.

$0.0370/img
alibaba/z-image-turbo
alibaba/z-image-turbo
text-to-image

Z-Image Turbo is a lightweight text-to-image model that quickly generates images with Chinese and English text rendering support. It always outputs 1 PNG image per request.

$0.0100~$0.0200/img
bytedance/seedance-2.0-fast-i2v
bytedance/seedance-2.0-fast-i2v
image-to-video

Faster variant of Dreamina Seedance 2.0 image-to-video. Accepts the same multimodal inputs as Seedance 2.0 I2V—text prompt plus optional reference images and audio—with lower latency. Resolution limited to 480p/720p.

$0.0650~$0.1400/sec
google/nano-banana-pro
google/nano-banana-pro
text-to-image

Nano Banana Pro image generation model with higher quality output. Supports aspect ratio and image size (1K/2K/4K resolution).

$0.1300~$0.2200/img

最新发布

vidu/viduq3-turbo-r2v
vidu/viduq3-turbo-r2v
image-to-video

Vidu Q3 Turbo reference-to-video model. Fast generation with your characters from 1-7 reference images. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 540p/720p/1080p; `duration` range 3-16s.

$0.0100~$0.0320/sec
vidu/viduq3-r2v
vidu/viduq3-r2v
image-to-video

Vidu Q3 reference-to-video model. Balanced quality from 1-7 reference images. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 540p/720p/1080p; `duration` range 3-16s.

$0.0180~$0.0370/sec
vidu/viduq3-mix-r2v
vidu/viduq3-mix-r2v
image-to-video

Vidu Q3 Mix reference-to-video model. Generates a video with your characters from 1-7 reference images using mixed-style synthesis. Both `prompt` and `reference_images` are required. `aspect_ratio` supports 16:9/9:16/1:1; `resolution` supports 720p/1080p only; `duration` range 1-16s.

$0.0600~$0.0720/sec
vidu/one-click-trending-replicate
vidu/one-click-trending-replicate
video-to-video

Vidu one-click trending replicate model. Recreates a trending video style with your own subjects. Both `video_url` (reference trend video) and `images` (subject images, 1-7) are required. Supports `prompt`, `aspect_ratio`, `resolution` (default 1080p), and `remove_audio`.

$0.0300~$0.0500/sec
vidu/lip-sync
vidu/lip-sync
video-to-video

Vidu lip sync model. Reanimates the lip movements in a video to match a replacement audio track. `video_url` is required. Provide `audio_url` as the new audio to sync lips to. Use `reference_face_image_url` to preserve face identity consistency across the video.

$0.0100/sec
vidu/motion-sync
vidu/motion-sync
video-to-video

Vidu motion sync model. Transfers motion from a source video onto a target character image. Both `image_url` and `video_url` are required.

$0.0250/sec
vidu/one-click-ad-film
vidu/one-click-ad-film
image-to-video

Vidu one-click ad film model. Automatically generates a marketing video from 1-7 product or scene images. `images` is required. Supports `prompt` (up to 2000 chars), `duration` (10-60s, default 15), `aspect_ratio`, and `language` (zh/en).

$0.1000/sec
vidu/one-click-general-film
vidu/one-click-general-film
image-to-video

Vidu one-click general film model. Automatically generates a cinematic film from 1-7 images. Both `images` and `duration` (10-180s) are required. Optionally accepts `prompt` (up to 3000 chars) and `aspect_ratio`.

$0.1000/sec

模型系列

GPT Image

The GPT-Image series by OpenAI consists of advanced multimodal models, such as GPT-Image-1 and GPT-Image-2, designed for generating and editing photorealistic images from text and image inputs.

Hailuo 02

MiniMax's Hailuo 02 series is a top-ranked cinematic AI video suite for T2V/I2V, generating native 1080p clips with ultra-realistic physics, character consistency, and director-level controls.

Hailuo 2.3

MiniMax's Hailuo 2.3 series elevates cinematic AI video gen with 4K T2V/I2V, hyper-realistic physics/motion, extended clips, and advanced character consistency.

HappyHorse

HappyHorse is a leading open-source AI video generation model with 15 billion parameters that jointly produces high-quality 1080p videos and synchronized audio from text or image prompts, currently topping the Artificial Analysis Video Arena leaderboard.

Imagen

Google Imagen is Google's premier text-to-image diffusion model, excelling in photorealistic, high-resolution image generation from textual prompts with unmatched detail, creativity, and adherence to complex descriptions.

Kling V3

Kuaishou's Kling v3 series is an open multimodal AI suite for T2I/I2V/T2V, generating 4K cinematic visuals with native audio, multi-shot narratives, precise motion control, and consistent characters.

Nano Banana

Nano Banana is an advanced AI image generation and editing model based on Google's Gemini technology, delivering fast, precise transformations with exceptional prompt understanding, consistent character editing, and high-quality visuals.

Qwen Image

Qwen Image is Alibaba's unified 7B text-to-image generation and editing model series, renowned for high-fidelity visuals, superior text rendering, Photoshop-like layered editing, and top rankings on global leaderboards.

Seedance

ByteDance's Seedance is a multimodal AI video generation model that creates cinematic 1080p multi-shot videos from text, images, audio, or video prompts with immersive audio-visual realism and director-level creative controls.

Seedream

ByteDance's Seedream is a high-fidelity text-to-image and editing model supporting native 4K resolution, batch generation, superior typography, and consistent character rendering for professional creative workflows.

Veo 3

Google Veo 3 is Google DeepMind's groundbreaking text-to-video AI model, unveiled at Google I/O 2025, that generates high-fidelity 4K cinematic videos with native synchronized audio from text or image prompts, offering professional controls and multi-scene coherence.

Veo 3.1

Google Veo 3.1 is the advanced successor to Veo 3, released in October 2025, enhancing 4K video generation with richer native audio, superior narrative control, precise image-to-video conversion, and seamless character consistency for dynamic storytelling.

Vidu Q3

Vidu Q3 is Shengshu AI’s advanced text-to-video and image-to-video model that generates up to 16-second clips with native audio, enhanced motion, and precise camera control.

Wan 2.6

Alibaba's Wan 2.6 is a powerful open-source AI video generation model that creates cinematic 1080p multi-shot videos with native audio-visual synchronization, supporting text-to-video, image-to-video, and professional storytelling workflows.

Wan 2.7

Alibaba's Wan 2.7 series is a comprehensive open-weight AI suite for image generation/editing and video creation, featuring thinking mode reasoning, first/last frame control, up to 4K images and 1080p videos, native audio sync, and exceptional text rendering accuracy.

接入领先 AI 媒体模型

探索图片、视频与音频模型,通过透明定价、在线试运行和统一 API 快速接入生产环境。

开始搜索
没有找到需要的模型?告诉我们。
视频生成101
图片生成66