探索 AI 模型 — 图片、视频和音频 API

[Core Function] Kling V3 Turbo T2V is a speed- and cost-optimized text-to-video model in the V3 family for fast short-form generation. [Strengths] It emphasizes lower latency and efficient throughput with native audio and improved lip-sync for talking-head style clips, typically targeting practical 720p/1080p short videos rather than maximum cinematic headroom. [Best For] Highly recommended for: rapid prototyping, social and ad iteration, batch short-form pipelines, and dialogue clips where turnaround time and unit cost matter most. [Limitations] Do NOT use this model if the user requires peak 4K cinematic fidelity, heavy multi-shot storyboard control, or maximum visual polish; use Kling V3 T2V or Kling V3 Omni T2V instead. Do NOT use it for image-conditioned animation; use Kling V3 Turbo I2V or Kling V3 I2V. [Routing] Choose Kling V3 Turbo T2V when the user says fast, quick, cheap, or high volume. Otherwise default to Kling V3 T2V for quality, or Kling V3 Omni T2V when consistency and Omni-class control are requested.

kling/kling-v3-omni-t2v

text-to-video

[Core Function] Kling V3 Omni T2V is a multimodal-leaning text-to-video model in the V3 family, oriented toward stronger semantic control and subject consistency in prompt-led generation. [Strengths] It targets high-fidelity cinematic clips with native audio options, flexible 3-15s duration, and better adherence when scenes demand coherent characters or multi-beat storytelling from text alone. [Best For] Highly recommended for: narrative T2V with recurring subjects, dialogue-aware scenes, brand or product continuity across beats, and premium short films where consistency matters more than raw throughput. [Limitations] Do NOT use this model if the user only needs the cheapest or fastest clip; prefer Kling V3 Turbo T2V. Do NOT use it when the workflow is image-first or needs multi-image references; use Kling V3 Omni I2V or Kling V3 I2V instead. Do NOT use it for deep physics-reasoning specialty tasks better served by Kling Video O1. [Routing] Choose Kling V3 Omni T2V when the user emphasizes Omni, consistency, multimodal quality, or complex text narratives. Prefer Kling V3 T2V as the default high-quality T2V baseline; prefer Kling V3 Turbo T2V when the user stresses speed, cost, or high-volume short-form output.

kling/kling-video-o1-t2v

text-to-video

[Core Function] Kling Video O1 T2V is the text-to-video slice of Kling O1 Omni Video. [Strengths] Reasoning-enhanced prompt planning with 3-10s duration and 720p/1080p output. [Best For] Complex physical interactions and logically demanding scenes from text alone. [Limitations] No native audio; duration capped at 10s; no multi_shot. [Routing] Prefer this for O1-quality T2V; use Kling Video O1 V2V when a source video is required.

google/gemini-omni-flash-t2v

text-to-video

[Core Function] Gemini Omni Flash T2V is Google's fast multimodal Text-to-Video generation model built on the Interactions API. [Strengths] It quickly turns a text prompt into a short 720p video with natively synchronized audio, offering low latency and solid prompt adherence. [Best For] Highly recommended for: rapid text-to-video prototyping, short social and marketing clips, quick concept visualization, and cases where speed and built-in audio matter more than 4K cinematic detail. [Limitations] Do NOT use this model if you need 1080p or 4K resolution or clips longer than 10 seconds; output is fixed at 720p, capped at 10 seconds, with aspect ratio limited to 16:9 or 9:16. [Routing] Choose this model when the user emphasizes 'fast', 'quick', or short multimodal clips with sound. If the user demands maximum cinematic quality, 4K, or longer videos, choose Veo 3.1 T2V instead.

skywork/skyreels-t2v

text-to-video

**[Core Function]** SkyReels Text-to-Video generates a video purely from a text prompt, with no input media. **[Strengths]** Strong prompt adherence and smooth motion; supports optional audio, 480p/720p/1080p output, and a fast/std quality-speed trade-off. **[Best For]** Turning an idea or script into video, concept visualization, story beats, and social clips generated from text. **[Limitations]** Do NOT use this when the user provides an image, video, or audio input; route to Image-to-Video (skyreels-i2v), Reference-to-Video (skyreels-r2v), or the video-to-video / Omni models instead. Output is capped at 1080p and 15s per clip; fast mode currently supports only sound=false (no audio). **[Routing]** Use for text-only generation. Choose mode=fast for quicker results or mode=std for balanced quality; set resolution and aspect_ratio as needed.

pixverse/c1-t2v

text-to-video

[Core Function] PixVerse c1 T2V generates a video purely from a text prompt, with no input image. [Strengths] Strong prompt adherence and smooth motion; optional audio. [Best For] Turning an idea or script into video, concept visualization, story beats, social clips from text. [Limitations] Do NOT use this when the user provides a starting image, two frames, or reference images; route to Image-to-Video, First-Last-Frame, or Reference-to-Video instead. c1 does not support multi-clip. [Routing] Use for text-only generation. The c1 variant takes the same inputs as v6 except it does not support multi-clip generation; choose c1-t2v when the user requests the c1 model and multi-clip is not needed.

pixverse/v6-t2v

text-to-video

[Core Function] PixVerse v6 T2V generates a video purely from a text prompt, with no input image. [Strengths] Strong prompt adherence and smooth motion; optional audio and multi-clip generation. [Best For] Turning an idea or script into video, concept visualization, story beats, social clips from text. [Limitations] Do NOT use this when the user provides a starting image, two frames, or reference images; route to Image-to-Video, First-Last-Frame, or Reference-to-Video instead. [Routing] Use for text-only generation. v6 additionally supports multi-clip generation (generate_multi_clip_switch), which c1 does not; choose v6-t2v when multi-clip output is needed or the v6 model is requested.

bytedance/seedance-2.0-mini-t2v

text-to-video

[Core Function] Seedance 2.0 Mini T2V is the lightweight text-to-video variant. [Strengths] Supports 480p/720p, 24 fps, 4-15s MP4 output. [Routing] Use for cost-efficient text-to-video.

bytedance/seedance-2.0-fast-t2v

text-to-video

[Core Function] Seedance 2.0 Fast T2V is the faster text-to-video variant. [Strengths] Supports 480p/720p, 24 fps, 4-15s MP4 output. [Routing] Use when speed is preferred over maximum resolution.

没有找到需要的模型？告诉我们。

探索所有模型