API Documentation
Power your creativity with LTX — an advanced model built for seamless video generation
Generate video with synchronized audio from text, images, and audio inputs. One HTTP call, one video back — no polling, no webhooks, no infrastructure to manage.
Powered by the most downloaded open-source video model on Hugging Face. Engineered for real-world workloads with predictable performance at any volume. Stable outputs, consistent fidelity, and infrastructure-grade reliability.
LTX API Capabilities
All endpoints return video with synchronized audio — dialogue, music, and ambient sound are generated together with the visuals.
Text-to-Video
Generate video from a text description. Describe a scene, camera movement, and mood — the API returns a complete video with matching audio. Up to 4K resolution and 20 seconds per request.
Image-to-Video
Animate a still image with realistic motion, depth, and audio. Provide a reference image and a prompt describing the desired motion. The output preserves the visual identity of the source image.
Audio-to-Video
Generate video driven by an audio track. Supply dialogue, music, or ambient sound and the API produces visuals synchronized to the audio. Optionally condition on a reference image for visual direction.
Retake
Re-generate a specific section of an existing video without starting over. Select a time range and mode (replace video, audio, or both) to iterate on parts of a generation while keeping the rest intact.
Extend
Lengthen an existing video from the beginning or end. Provide a video, a duration, and a context window — the API generates new frames that continue seamlessly from the original, preserving audio and visual continuity.