Generate video from audio
Authentication
Request
Audio file to be used as the soundtrack for the video. Duration must be between 2 and 20 seconds. See Input Formats for supported formats and size limits.
Input image to be used as the first frame of the video. Required if prompt is not provided. See Input Formats for supported formats and size limits.
Text description of how the video should be generated. Required if image_uri is not provided. Can be empty string when image_uri is provided. If image_uri is provided, this describes how the image should be animated. If no image_uri is provided, this describes the video content.
The resolution of the generated video in WIDTHxHEIGHT format. When omitted, the resolution is automatically determined based on the input image orientation — portrait images produce 1080x1920 video, landscape images produce 1920x1080 video. If no image is provided, defaults to 1920x1080.
Optional guidance scale (also known as CFG) for video generation. Higher values make the output more closely follow the prompt but may reduce quality. Defaults to 5 for text-to-video, or 9 when providing an image.