PyTorch API

For developers who want direct Python integration or custom workflows beyond ComfyUI, LTX-2 offers two paths: the native ltx-pipelines package (full control, all features) and the HuggingFace Diffusers integration (familiar API, quick start).

Repository Structure

The LTX-2 codebase is a monorepo with three packages:

PackagePurpose
ltx-coreModel architecture, schedulers, guiders, noisers, and patchifiers
ltx-pipelinesHigh-level inference pipelines for text-to-video, image-to-video, and IC-LoRA workflows
ltx-trainerLoRA, IC-LoRA, and full fine-tuning (see Trainer docs)

Requirements

  • Python >= 3.10
  • CUDA > 12.7
  • PyTorch ~= 2.7

See System Requirements for full hardware specifications.

Installation

$# Clone the repository
$git clone https://github.com/Lightricks/LTX-2.git
$cd LTX-2
$
$# Set up the environment
$uv sync --frozen
$source .venv/bin/activate

Download Models

$# Download the distilled model (recommended for speed)
$huggingface-cli download Lightricks/LTX-2.3 \
> --include "ltx-2.3-22b-distilled.safetensors" \
> --local-dir models/
$
$# Download the full model (higher quality, slower generation)
$huggingface-cli download Lightricks/LTX-2.3 \
> --include "ltx-2.3-22b-dev.safetensors" \
> --local-dir models/
$
$# FP8 variant (smaller download, requires less VRAM)
$huggingface-cli download Lightricks/LTX-2.3-fp8 \
> --include "ltx-2.3-22b-dev-fp8.safetensors" \
> --local-dir models/

A full list of available checkpoints (including spatial/temporal upscalers, distilled LoRA, IC-LoRA variants, and camera control LoRAs) is on the LTX-2 GitHub repo.


Option 1: Native Pipelines (ltx-pipelines)

The ltx-pipelines package provides the most complete feature set, including two-stage generation, IC-LoRA, and fine-grained guidance control.

Available Pipelines

PipelineUse Case
TI2VidTwoStagesPipelineText-to-video and image-to-video with two-stage upscaling. Best for high-quality production use.
TI2VidTwoStagesRes2sPipelineTwo-stage with res_2s second-order sampler. Different quality/speed tradeoffs, fewer steps required.
TI2VidOneStagePipelineSingle-stage text/image-to-video for quick prototyping without upscaling.
DistilledPipelineFast two-stage generation using the distilled checkpoint. Best for speed and batch processing.
ICLoraPipelineVideo-to-video with IC-LoRA control signals (depth, pose, canny edges). Uses the distilled model. Best for guided transformations.
A2VidPipelineTwoStageAudio-to-video generation conditioned on input audio.
RetakePipelineRegenerate specific time regions of existing video without starting over.
KeyframeInterpolationPipelineInterpolate between keyframe images to generate smooth transitions.

Text-to-Video Example

This example generates a video using the distilled pipeline with a two-stage approach — base generation followed by upscale refinement.

1import torch
2from ltx_pipelines.distilled_pipeline import DistilledPipeline
3
4# Initialize
5pipeline = DistilledPipeline.from_config("path/to/config.yaml")
6
7# Generate
8output = pipeline(
9 prompt="A golden retriever running through a sunlit meadow, "
10 "wildflowers swaying in a gentle breeze. "
11 "Camera follows at ground level, tracking the dog. "
12 "Warm afternoon light with soft bokeh in the background. "
13 "Sound of panting, rustling grass, distant birdsong.",
14 width=768,
15 height=512,
16 num_frames=97,
17 fps=24.0,
18 seed=42,
19)

Dimension constraints: Width and height must be divisible by 32. Frame count must follow the pattern 8n + 1 (valid values: 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121, etc.).

Guidance Parameters

The native pipelines expose MultiModalGuiderParams for fine-grained control over generation:

ParameterRangeDescription
cfg_scale2.0–5.0Classifier-Free Guidance. Higher values increase prompt adherence. Set to 1.0 to disable.
stg_scale0.5–1.5Spatio-Temporal Guidance for temporal coherence. Set to 0.0 to disable.
stg_blockse.g. [29]Transformer blocks to perturb for STG. Set to [] to disable.
rescale_scale~0.7Rescales guided prediction to prevent over-saturation.
modality_scale1.0–3.0Audio-visual sync strength. Set > 1.0 when generating with audio.

Memory Optimization

For consumer GPUs, enable FP8 quantization to reduce VRAM usage by ~40% with minimal quality loss:

1from ltx_core.quantization.policy import QuantizationPolicy
2
3# FP8 Cast — broad GPU compatibility
4pipeline = DistilledPipeline.from_config(
5 "path/to/config.yaml",
6 quantization=QuantizationPolicy.fp8_cast(),
7)
8
9# FP8 Scaled MM — optimized for Hopper GPUs (H100)
10# Requires: uv sync --frozen --extra fp8-trtllm
11pipeline = DistilledPipeline.from_config(
12 "path/to/config.yaml",
13 quantization=QuantizationPolicy.fp8_scaled_mm(),
14)

From the command line:

$# FP8 Cast (works on most GPUs)
$python run_pipeline.py --quantization fp8-cast
$
$# FP8 Scaled MM (Hopper GPUs only, uses TensorRT-LLM)
$python run_pipeline.py --quantization fp8-scaled-mm

Additional tip: Set the environment variable PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to improve memory allocation.


Option 2: HuggingFace Diffusers

If you’re already working within the Diffusers ecosystem, LTX-2 is available as a native pipeline:

1import torch
2from diffusers import LTX2Pipeline
3
4pipeline = LTX2Pipeline.from_pretrained(
5 "Lightricks/LTX-2",
6 torch_dtype=torch.bfloat16,
7)
8pipeline.to("cuda")
9
10# Generate
11result = pipeline(
12 prompt="A golden retriever running through a sunlit meadow, "
13 "wildflowers swaying in a gentle breeze.",
14 width=768,
15 height=512,
16 num_frames=97,
17)

The Diffusers integration provides a simpler interface but may not expose all features available in the native ltx-pipelines package (e.g., IC-LoRA, advanced guidance parameters). For full feature access, use the native pipelines.

For more on the Diffusers integration, see the HuggingFace documentation.


Generation Parameters Reference

Resolution

Standard aspect ratios:

ResolutionAspect RatioNotes
768×5123:2 landscapeGood default for wide shots
512×7682:3 portraitVertical/mobile content
704×5124:3 standardClassic frame
512×7043:4 vertical
640×6401:1 squareSocial media

Higher resolutions are supported (up to 4K) but require significantly more VRAM. Start with lower resolutions for testing.

Frame Count & Duration

FramesDuration (24fps)Duration (25fps)
65~2.7s~2.6s
97~4.0s~3.9s
121~5.0s~4.8s
161~6.7s~6.4s
257~10.7s~10.3s

Sampling

ParameterDistilled ModelFull Model
Steps4–820–50
CFG Scale1.02.0–5.0
Recommended3.0–3.53.0–3.5

What’s Next