API Documentation
Power your creativity with LTX — an advanced model built for seamless video generation
Generate video with synchronized audio from text, images, and audio inputs. One HTTP call, one video back — no polling, no webhooks, no infrastructure to manage.
Powered by the most downloaded open-source video model on Hugging Face. Engineered for real-world workloads with predictable performance at any volume. Stable outputs, consistent fidelity, and infrastructure-grade reliability.
LTX API Capabilities
All endpoints return video with synchronized audio — dialogue, music, and ambient sound are generated together with the visuals.
Generate video from a text description. Describe a scene, camera movement, and mood — the API returns a complete video with matching audio. Up to 4K resolution and 20 seconds per request.
Animate a still image with realistic motion, depth, and audio. Provide a reference image and a prompt describing the desired motion. The output preserves the visual identity of the source image.
Generate video driven by an audio track. Supply dialogue, music, or ambient sound and the API produces visuals synchronized to the audio. Optionally condition on a reference image for visual direction.
Re-generate a specific section of an existing video without starting over. Select a time range and mode (replace video, audio, or both) to iterate on parts of a generation while keeping the rest intact.
Lengthen an existing video from the beginning or end. Provide a video, a duration, and a context window — the API generates new frames that continue seamlessly from the original, preserving audio and visual continuity.