alibaba/happyhorse-1.0-r2v

happyhorse-1.0-r2v
Docs

HappyHorse reference-to-video model generates fluid videos by fusing characters from multiple reference images (1-9 images) through text prompts with character references (character1, character2, etc.). Supports 720P/1080P resolution, multiple aspect ratios, and 3-15 seconds duration.

$0.1280~$0.2290/sec
image-to-video

Input

Text prompt describing the scene and referencing characters. Required. Use character1, character2, character3, etc. to reference images in the reference_images array (first image = character1, second = character2, etc.). Supports any language input. Maximum length: 5000 non-Chinese characters or 2500 Chinese characters (automatically truncated if exceeded)
Video duration in seconds. Must be an integer between 3 and 15
5
Video aspect ratio. Determines the output video dimensions
16:9
Array of reference image URLs. Supports 1-9 images. Images are referenced in the prompt using character1, character2, etc., following array order. Image requirements: Format: JPEG, JPG, PNG, WEBP; Resolution: Short edge >= 400 pixels (720P+ recommended); File size: <= 10MB per image. Supports HTTP/HTTPS URLs
Hint: Drag and drop files, paste from clipboard (Ctrl/Cmd+V), or provide a URL.
Video resolution level. The model automatically scales to the nearest total pixels based on the selected resolution
1080P
Random seed for reproducibility. If not specified, the system generates a random seed. Note: Due to the probabilistic nature of model generation, even with the same seed, results may not be completely identical

Result

No results yet

Run the model to preview the output here.

More in this series