Generate video from audio

Generate a video from an audio file, optionally with an image and prompt using AI models. Output video is 25 fps.

Authentication

AuthorizationBearer
API key authentication

Request

This endpoint expects an object.
audio_uristringRequired

Audio file to be used as the soundtrack for the video. See Input Formats for supported formats and size limits.

image_uristringOptional

Input image to be used as the first frame of the video. Required if prompt is not provided. See Input Formats for supported formats and size limits.

promptstringOptional<=5000 characters

Text description of how the video should be generated. Required if image_uri is not provided. Can be empty string when image_uri is provided. If image_uri is provided, this describes how the image should be animated. If no image_uri is provided, this describes the video content.

resolutionstringOptionalDefaults to 1920x1080
The resolution of the generated video in WIDTHxHEIGHT format. Currently only 1920x1080 is supported.
guidance_scaledoubleOptional1-50

Optional guidance scale (also known as CFG) for video generation. Higher values make the output more closely follow the prompt but may reduce quality. Defaults to 5 for text-to-video, or 9 when providing an image.

modelenumOptionalDefaults to ltx-2-pro

Model to use for video generation. Currently only ltx-2-pro is supported.

Allowed values:

Response

Video generated successfully

Errors