API Documentation

Power your creativity with LTX — an advanced model built for seamless video generation

Generate video with synchronized audio from text, images, and audio inputs. One HTTP call, one video back — no polling, no webhooks, no infrastructure to manage.

Powered by the most downloaded open-source video model on Hugging Face. Engineered for real-world workloads with predictable performance at any volume. Stable outputs, consistent fidelity, and infrastructure-grade reliability.

LTX API Capabilities

All endpoints return video with synchronized audio — dialogue, music, and ambient sound are generated together with the visuals.

Text-to-Video

Generate video from a text description. Describe a scene, camera movement, and mood — the API returns a complete video with matching audio. Up to 4K resolution and 20 seconds per request.

Image-to-Video

Animate a still image with realistic motion, depth, and audio. Provide a reference image and a prompt describing the desired motion. The output preserves the visual identity of the source image.

Audio-to-Video

Generate video driven by an audio track. Supply dialogue, music, or ambient sound and the API produces visuals synchronized to the audio. Optionally condition on a reference image for visual direction.

Retake

Re-generate a specific section of an existing video without starting over. Select a time range and mode (replace video, audio, or both) to iterate on parts of a generation while keeping the rest intact.

Extend

Lengthen an existing video from the beginning or end. Provide a video, a duration, and a context window — the API generates new frames that continue seamlessly from the original, preserving audio and visual continuity.

Get Started