alibaba/happyhorse-1.0-r2v

happyhorse-1.0-r2v

HappyHorse reference-to-video model generates fluid videos by fusing characters from multiple reference images (1-9 images) through text prompts with character references (character1, character2, etc.). Supports 720P/1080P resolution, multiple aspect ratios, and 3-15 seconds duration.

$0.1280~$0.2290/sec

image-to-video

Input

prompt*Text prompt describing the scene and referencing characters. Required. Use character1, character2, character3, etc. to reference images in the reference_images array (first image = character1, second = character2, etc.). Supports any language input. Maximum length: 5000 non-Chinese characters or 2500 Chinese characters (automatically truncated if exceeded)

durationVideo duration in seconds. Must be an integer between 3 and 15

ratioVideo aspect ratio. Determines the output video dimensions

16:9

reference_images*Array of reference image URLs. Supports 1-9 images. Images are referenced in the prompt using character1, character2, etc., following array order. Image requirements: Format: JPEG, JPG, PNG, WEBP; Resolution: Short edge >= 400 pixels (720P+ recommended); File size: <= 10MB per image. Supports HTTP/HTTPS URLs

Hint: Drag and drop files, paste from clipboard (Ctrl/Cmd+V), or provide a URL.

resolutionVideo resolution level. The model automatically scales to the nearest total pixels based on the selected resolution

1080P

seedRandom seed for reproducibility. If not specified, the system generates a random seed. Note: Due to the probabilistic nature of model generation, even with the same seed, results may not be completely identical

No results yet

Run the model to preview the output here.

alibaba/happyhorse-1.0-t2vtext-to-video

alibaba/happyhorse-1.0-i2vimage-to-video

alibaba/happyhorse-1.0-video-editvideo-to-video

alibaba/happyhorse-1.0-r2v

Input

Result

More in this series