Generate video from audio

Generate a video from an audio file, optionally with an image and prompt using AI models. Output video is 25 fps.

Authentication

AuthorizationBearer
API key authentication

Request

This endpoint expects an object.
audio_uristringRequired

Audio file to be used as the soundtrack for the video. Duration must be between 2 and 20 seconds. See Input Formats for supported formats and size limits.

image_uristringOptional

Input image to be used as the first frame of the video. Required if prompt is not provided. See Input Formats for supported formats and size limits.

promptstringOptional<=5000 characters

Text description of how the video should be generated. Required if image_uri is not provided. Can be empty string when image_uri is provided. If image_uri is provided, this describes how the image should be animated. If no image_uri is provided, this describes the video content.

resolutionenumOptional

The resolution of the generated video in WIDTHxHEIGHT format. When omitted, the resolution is automatically determined based on the input image orientation — portrait images produce 1080x1920 video, landscape images produce 1920x1080 video. If no image is provided, defaults to 1920x1080.

Allowed values:
guidance_scaledoubleOptional1-50

Optional guidance scale (also known as CFG) for video generation. Higher values make the output more closely follow the prompt but may reduce quality. Defaults to 5 for text-to-video, or 9 when providing an image.

modelenumOptionalDefaults to ltx-2-3-pro
Model to use for video generation.
Allowed values:

Response

Video generated successfully

Errors