OpenAI shut down Sora’s consumer service in April 2026. The calculation was simple: compute costs were unsustainable, and the pivot to enterprise AGI meant consumer video generation no longer fit the roadmap.

That left a real gap. And the most prepared beneficiary is a model most Western developers have underestimated: Kling AI, from Kuaishou.

As of April 2026, Kling is the most actively maintained video generation model family with a public API. It spans over 30 variants across text-to-image, image-to-image, text-to-video, image-to-video, avatar, and video effects. Kling V3 now supports multi-shot storyboard generation. Kling V2.6 generates synchronized audio and video in a single API call, with no separate TTS step required.

This guide covers the complete Kling model lineup, how pricing is structured by version and mode, which variant fits which use case, and how to start generating in under 10 minutes.

kling api pricing cover

The Kling AI Model Family: 30+ Variants Explained

Most developers who have heard of Kling think “video generation.” That is accurate but incomplete. The Kling family covers five distinct capability categories, each with multiple model generations.

kling ai api model family overview

Text-to-Video (T2V). From Kling V1 T2V, which generates 5 to 10 second clips with camera motion presets (pan, tilt, zoom), to Kling V3 T2V, which supports multi-shot storyboard generation with 3 to 15 seconds per clip. Kling V2.5 Turbo T2V delivers 1080p output starting at $0.050/s (standard mode), a fraction of V2.1 Master T2V at $0.320/s, making it the default choice for high-volume workloads.

Image-to-Video (I2V). Kling V1.6 I2V introduced first-and-last frame control, letting developers define both the opening and closing frame of a clip for precise narrative transitions. Kling V1.6 MI2V takes up to 4 reference images and fuses them into a single cohesive video sequence. Kling V2.6 I2V is the first model in the family to natively output synchronized audio and video in one API call.

Text-to-Image and Image-to-Image. Kling V3 T2I supports 1K and 2K output with improved prompt adherence over V2.1. Kling V3 Omni Image generates up to 4K resolution with support for up to 4 reference images. At the 2K tier they are increasingly competitive with Flux and Seedream.

Avatar and Talking-Head Video. Kling Avatar generates realistic talking-head videos from a reference image and audio input. It supports precise lip synchronization, expressive gestures, and multiple languages. Audio input is either a TTS-generated audio ID or a direct sound file (MP3, WAV, M4A, AAC, up to 5MB, 2 to 300 seconds).

Creative Effects and Utility Models. Kling Video Effects applies 212 preset creative effects to person images. Kling Image Expansion extends images in any direction with prompt-guided generation. Kling Image Recognize segments image content into object, head (with hair), face (without hair), and clothing categories, useful for downstream masking workflows.

Kling API Pricing: How the Cost Structure Works

Most write-ups on Kling API pricing are either outdated (still quoting V1 rates) or incomplete (covering only one model version). Here is how to think about pricing as of April 2026.

Kling AI API pricing tiers: V1, V2.1 Master, V2.5 Turbo, and V3 compared by cost and capability

Pricing scales with model generation and quality mode.

Every Kling video model offers two modes: standard (std) and professional (pro). Standard mode is the default. Professional mode produces higher visual quality at a higher cost per request. Most production workloads start with standard mode and upgrade selectively for user-facing output.

Kling V2.5 Turbo is the price-performance sweet spot.

Kling V2.5 Turbo T2V and V2.5 Turbo I2V deliver cinematic 1080p output starting at $0.050/s (standard mode). V2.1 Master T2V runs at $0.320/s — the Turbo tier sits well below that while maintaining the same output resolution and physics-accurate motion. For teams running thousands of generations per month, this difference compounds fast. The turbo designation refers to cost efficiency, not a quality reduction.

V3 pricing is higher, but V3 unlocks capabilities V2 cannot deliver.

Kling V3 supports multi-shot storyboard generation (chaining coherent shots with per-scene prompts), extended duration up to 15 seconds per clip, and V3 Omni formats for 4K image output and video-to-video with element references. These are not incremental improvements. If your workflow requires narrative video with consistent characters across scenes, V3 is the only option in the Kling family.

Per-request pricing, no monthly minimums.

Kling API access through Modellix uses per-request pricing. You pay for what you generate. There are no seat fees or monthly minimums. Rate limits are 100 requests per minute across Kling endpoints, with rate limit headers returned on each response (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset).

For current per-second and per-clip rates, check Modellix pricing directly. Rates change, and stale numbers are worse than no numbers.

Kling 3.0 API: What Changes at V3

Kling V3 (referred to as Kling 3.0 in Kuaishou’s consumer product) is the most significant capability upgrade since V2. Three things change materially.

flowchart LR
    STORY["Storyboard prompts"] --> SHOT1["Shot 1
Establish scene"] STORY --> SHOT2["Shot 2
Add motion"] STORY --> SHOT3["Shot 3
Close with transition"] REFS["Shared character and element references"] -.-> SHOT1 REFS -.-> SHOT2 REFS -.-> SHOT3 SHOT1 --> SHOT2 --> SHOT3 --> OUT["One coherent multi-shot video"] classDef accent fill:#eef2ff,stroke:#6366f1,color:#111827,stroke-width:1px; classDef node fill:#ffffff,stroke:#94a3b8,color:#111827,stroke-width:1px; classDef success fill:#ecfdf5,stroke:#10b981,color:#111827,stroke-width:1px; class STORY,REFS accent; class SHOT1,SHOT2,SHOT3 node; class OUT success;

Multi-shot storyboard generation. Kling V3 T2V and I2V both support storyboard-based multi-shot workflows. Instead of a single prompt producing a single clip, you define a sequence with separate prompts per shot. Kling V3 maintains character and scene consistency across shots. This is the mechanism behind the consumer product’s “long-form video” feature: multiple coherent shots assembled into a longer narrative.

Duration range: 3 to 15 seconds per clip. Kling V1 and V2 cap at 10 seconds per clip. V3 raises the ceiling to 15 seconds and introduces a new 3-second minimum, useful for short motion loops where V1’s fixed 5-second minimum wasted generation budget.

V3 Omni formats. Kling V3 Omni Image generates up to 4K with support for up to 4 reference images for character and element consistency. Kling V3 Omni Video extends this to video-to-video: prompts, storyboard segments, image references, and element references combine to produce a synthesized output. This is purpose-built for creative productions that require fine-grained control over every element in frame.

Kling V2.6: Native Audio and Video in One API Call

Kling V2.6 I2V and V2.6 T2V are the first models in the Kling family to generate synchronized audio and video in a single pass.

Before V2.6, a talking-head or avatar workflow required three steps: generate audio via a TTS API, animate a reference image with Kling Avatar, then sync audio and video in post. With Kling V2.6, these collapse into one call.

flowchart LR
    subgraph BEFORE["Before V2.6: 3 separate API calls"]
        direction TB
        A1["1. TTS API
Generate audio"] --> A2["2. Kling Avatar
Animate image"] --> A3["3. Post-processing
Sync audio and video"] end subgraph AFTER["After V2.6: 1 API call"] direction TB B1["Image + voice_ids
sound: on"] --> B2["Kling V2.6 I2V
Async request"] --> B3["Video output
Dialogue + lip-sync + SFX"] end BEFORE -.->|"Collapse into one API call"| AFTER classDef before fill:#fff1f2,stroke:#ef4444,color:#111827,stroke-width:1px; classDef core fill:#eef2ff,stroke:#6366f1,color:#111827,stroke-width:1px; classDef after fill:#ecfdf5,stroke:#10b981,color:#111827,stroke-width:1px; class A1,A2,A3 before; class B2 core; class B1,B3 after;

The model accepts voice_ids (up to 10 voice IDs for audio generation) and a sound parameter. Set "sound": "on", pass your voice IDs, and the model outputs a video clip with dialogue, sound effects, lip-sync, and motion already embedded. No post-processing step required.

For localization pipelines and short-form content at scale, this is a meaningful reduction in engineering complexity and per-clip latency.

Which Kling API Version Should You Use?

Use case Recommended model Why
High-volume T2V at lowest cost Kling V2.5 Turbo T2V $0.050/s std, same 1080p as Master
High-volume I2V at lowest cost Kling V2.5 Turbo I2V $0.050/s std, vs $0.320/s for Master
Best single-clip I2V quality Kling V2.1 Master I2V Recommended quality tier in V2.1 series
Multi-shot narrative video Kling V3 T2V or V3 I2V Only V3 supports storyboard generation
Talking-head with external audio Kling Avatar Audio file or TTS ID input, lip-sync
Talking-head with native audio Kling V2.6 I2V Audio synthesis embedded in one call
4K image generation Kling V3 Omni Image Up to 4K, 4 reference images
Multi-image video fusion Kling V1.6 MI2V Up to 4 reference images fused into video
Creative effects on person images Kling Video Effects 212 preset effects
Longest established track record Kling V1 T2V Most widely deployed, lowest per-clip cost

Standard mode is appropriate for most production workloads. Upgrade to professional mode when output quality directly affects end-user experience: B2C video products, marketing assets, client deliverables.

How to Access the Kling AI API via Modellix

Kling is a Kuaishou product. Direct access to Kuaishou’s developer API requires a Chinese business account with regional verification, which blocks most international developers from accessing even V1. Modellix routes access to all Kling models under a standard Bearer token, with no Kuaishou developer account or Chinese business verification required.

Here is a minimal Python example for Kling V2.5 Turbo T2V:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import requests
import time

# Submit generation
response = requests.post(
"https://api.modellix.ai/api/v1/kling/kling-v2.5-turbo-t2v/async",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json"
},
json={
"prompt": "A red panda sitting in a bamboo forest, wind blowing, cinematic lighting",
"mode": "std",
"aspect_ratio": "16:9",
"duration": 10
}
)

task_id = response.json()["data"]["task_id"]

# Poll for result
while True:
result = requests.get(
f"https://api.modellix.ai/api/v1/tasks/{task_id}",
headers={"Authorization": "Bearer YOUR_API_KEY"}
).json()
if result["data"]["status"] == "completed":
print(result["data"]["video_url"])
break
time.sleep(5)

To switch to Kling 3.0 (V3 T2V), change the endpoint path:

1
"https://api.modellix.ai/api/v1/kling/kling-v3-t2v/async"

All authentication headers and JSON parameter keys stay consistent across model versions. Switching models is a one-line change.

For Kling V2.6 I2V with native audio synthesis:

1
2
3
4
5
6
7
8
json={
"image": "https://example.com/portrait.jpg",
"prompt": "Person speaking to camera, natural gesture, warm studio lighting",
"sound": "on",
"voice_ids": ["your-voice-id"],
"mode": "pro",
"duration": 10
}

Kling API vs fal.ai vs WaveSpeed: What Each Platform Actually Covers

Platform Kling V1 Kling V2.x Kling V2.6 (audio) Kling V3 Avatar Image models
Modellix All variants All variants Yes Yes Yes Yes
fal.ai V1 I2V only Partial No No No No
WaveSpeed AI Partial Partial Unknown Unknown No No

fal.ai’s Kling listing covers Kling V1 image-to-video standard. WaveSpeed AI covers selected Kling models but is primarily optimized for inference speed on a smaller model set.

Neither covers V3 multi-shot storyboard, V2.6 native audio synthesis, Kling Avatar, Kling Video Effects, or the full image generation family (T2I, I2I, Omni). If your requirement is a single Kling V1 endpoint, fal.ai is a valid option. If you need the full Kling model family under one integration, from V1 T2V to V3 Omni, with Avatar and image capabilities included, the only gateway that currently covers all variants is Modellix.

Kling is not the only model worth considering. Google Veo 3.1 currently leads on Artificial Analysis for overall cinematic quality, and Seedance 2.0 from ByteDance remains a strong option for image-to-video workflows with competitive pricing. Both are available on the same Modellix integration, so teams can test across models without managing separate accounts.

Modellix: one unified platform for all AI media generation — Kling, Seedance, Veo, Hailuo, Wan via a single API key

Access the Full Kling Model Family via a Single API

Modellix provides unified access to all Kling variants from V1 T2I to V3 Omni, plus Kling Avatar, Video Effects, Image Expansion, and image recognition. One API key, one consistent endpoint pattern, transparent per-request pricing. No Kuaishou developer account or Chinese business verification required. Veo 3.1, Seedance 2.0, Wan 2.6, and other leading models are available under the same integration.


Frequently Asked Questions About Kling AI API (April 2026)

Does Kling AI have an API?

Yes. Kling AI is developed by Kuaishou and has a developer API. However, direct access requires a Kuaishou developer account with Chinese business verification, which restricts international access. Developers outside China typically access Kling through a third-party gateway such as Modellix, which provides all Kling V1 through V3 models under a standard Bearer token with no regional restrictions.

How much does Kling AI API cost?

Kling API pricing varies by model version and generation mode. Standard mode is lower cost than professional mode. Kling V2.5 Turbo is approximately 30% cheaper than standard Kling V2.1 Master at the same 1080p resolution. V3 is priced higher than V2 to reflect additional capabilities. Modellix uses per-request pricing with no monthly minimums. Current per-clip rates are listed at docs.modellix.ai/get-started/pricing.

How much is the Kling 2.5 Turbo API?

As of April 2026, Kling V2.5 Turbo T2V starts at $0.050/s (standard) and $0.082/s (pro). V2.5 Turbo I2V is priced identically. For comparison, V2.1 Master T2V and I2V both run at $0.320/s — the Turbo tier is purpose-built for production volume where per-clip cost matters. Output is still cinematic 1080p with physics-accurate motion. Current rates are listed at docs.modellix.ai/get-started/pricing.

What is Kling 3.0 API and what does it add over V2?

Kling V3 (Kling 3.0 in the consumer product) adds three material capabilities: multi-shot storyboard generation for narrative video sequences, extended clip duration from 3 to 15 seconds (V2 maxes at 10 seconds), and V3 Omni formats supporting up to 4K image output and video-to-video with element references. V3 also delivers improved prompt adherence compared to V2.1.

Can the Kling API generate audio and video together?

Yes, starting with Kling V2.6. Both V2.6 T2V and V2.6 I2V support native synchronized audio and video generation in a single API call. Pass voice_ids and set sound: on, and the model outputs a clip with dialogue, sound effects, and lip-sync already embedded. No separate TTS pipeline required.

What is the rate limit for Kling API on Modellix?

Kling endpoints via Modellix are rate-limited at 100 requests per minute. Rate limit status is returned in response headers: X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset.

Is Kling API a good replacement for Sora?

Sora’s consumer service was shut down by OpenAI in April 2026. For developers who need a production video generation API, Kling is among the most actively maintained alternatives, with a broader model family and a well-documented API. For the highest cinematic quality, Veo 3.1 from Google leads on Artificial Analysis benchmarks as of April 2026. Seedance 2.0 from ByteDance is a strong alternative for image-to-video with competitive pricing. All three are available via Modellix under one integration.

How do I switch between Kling model versions without rewriting my integration?

Via Modellix, switching Kling model versions requires changing one value: the endpoint path. Changing /kling-v2.5-turbo-t2v/async to /kling-v3-t2v/async switches from V2.5 Turbo to V3. Authentication headers, parameter keys, and response parsing stay identical. No architectural changes required.


Access all Kling variants (V1 through V3, Avatar, Effects, Image) alongside Seedance 2.0, Veo 3.1, Hailuo, Wan, and every other major AI media model through a single API key at modellix.ai.