API Documentation
Power your creativity with LTX — an advanced model built for seamless video generation
Generate video with synchronized audio from text, images, and audio inputs. Two APIs to pick from: a sync API that returns the video in one HTTP call — simplest for short clips and quick experiments — and an async API that submits a job and polls for the result — recommended for production, where polling beats holding a long-lived connection.
Powered by the most downloaded open-source video model on Hugging Face. Engineered for real-world workloads with predictable performance at any volume. Stable outputs, consistent fidelity, and infrastructure-grade reliability.
LTX API Capabilities
All endpoints return video with synchronized audio — dialogue, music, and ambient sound are generated together with the visuals.
Generate video from a text description. Describe a scene, camera movement, and mood — the API returns a complete video with matching audio. Up to 4K resolution and 20 seconds per request.
Animate a still image with realistic motion, depth, and audio. Provide a reference image and a prompt describing the desired motion. The output preserves the visual identity of the source image.
Generate video driven by an audio track. Supply dialogue, music, or ambient sound and the API produces visuals synchronized to the audio. Optionally condition on a reference image for visual direction.
Re-generate a specific section of an existing video without starting over. Select a time range and mode (replace video, audio, or both) to iterate on parts of a generation while keeping the rest intact.
Lengthen an existing video from the beginning or end. Provide a video, a duration, and a context window — the API generates new frames that continue seamlessly from the original, preserving audio and visual continuity.