
HappyHorse text-to-video model generates physically realistic and smoothly animated video content from text prompts. The model focuses on physical realism and motion fluidity, supporting various resolution and aspect ratio combinations with 3-15 seconds duration.

HappyHorse image-to-video model generates physically realistic and smoothly animated video content from a first frame image. The model can optionally use text prompts for guidance, supporting 720P/1080P resolution and 3-15 seconds duration. The output video aspect ratio follows the input first frame image automatically.

HappyHorse reference-to-video model generates fluid videos by fusing characters from multiple reference images (1-9 images) through text prompts with character references (character1, character2, etc.). Supports 720P/1080P resolution, multiple aspect ratios, and 3-15 seconds duration.

HappyHorse video editing model supports style transformation and local replacement by combining input video with reference images (0-5) and text instructions. Input video duration: 3-60 seconds. Output video duration: 3-15 seconds (automatically truncates to first 15 seconds if input exceeds). Processing time: typically 1-5 minutes.