Trainer Quick Start
Get up and running with LTX-2 training in a few steps.
New to training? Start with the agent. You don’t have to run these steps by hand — open the repo in Claude Code
and run /train-model. It makes the same decisions described below and explains each step as it goes, pausing for your
approval before any heavy work. See the
train-model skill for the full
phase-by-phase reference.
Prerequisites
Before you begin, ensure you have:
- LTX-2 model checkpoint — a local
.safetensorsfile with the model weights. Downloadltx-2.3-22b-dev.safetensorsfrom the LTX-2.3 collection on HuggingFace. - Gemma text encoder — a local directory with the Gemma model (required for LTX-2). Download from HuggingFace.
- Linux with CUDA — the trainer requires
triton, which is Linux-only. - A GPU with enough VRAM — 80GB recommended for the standard config.
For 32GB GPUs (e.g. RTX 5090), use the low-VRAM config, which enables INT8 quantization and other memory optimizations.
Installation
First install uv if you haven’t already, then clone the repository:
The ltx-trainer package is part of the LTX-2 monorepo. Install dependencies from the repository root, then move
into the trainer package:
The trainer depends on the ltx-core and ltx-pipelines packages, which are installed automatically from the
monorepo.
Training Workflow
1. Prepare your dataset
Organize your videos with captions, then preprocess them into cached latents and text embeddings:
Audio latents are extracted from your videos automatically. Optional scene splitting (split_scenes.py) and automatic
captioning (caption_videos.py) are also available. For the full preprocessing workflow — dataset format, captioning
setup, resolution buckets, masks, and all CLI options — see the
Dataset Preparation guide on GitHub.
2. Configure training
Create or modify a configuration YAML file. Start from one of the example configs:
t2v_lora.yaml— text-to-video LoRAt2v_lora_low_vram.yaml— same, tuned for ~32GB VRAM (INT8 quantization and memory optimizations)v2v_ic_lora.yaml— IC-LoRA video-to-video
Key settings to update:
See the Configuration Reference for all available options.
3. Start training
For multi-GPU training:
See the Training Guide for distributed training (DDP/FSDP), HuggingFace Hub uploads, and Weights & Biases logging.
Training Modes
First time? Start with t2v_lora.yaml — it’s the simplest mode and only requires videos with captions. Explore other modes once you’ve confirmed your setup works.
All modes are expressed through the single flexible training strategy. The trainer supports:
See Training Modes for detailed explanations of each mode.
Reference docs on GitHub
Preprocess videos, generate captions, and build resolution buckets.
Every available training parameter.
Distributed training (DDP/FSDP), HuggingFace Hub, and W&B logging.
Tools for dataset management and debugging.
Go beyond the flexible strategy with your own training logic.
Solutions to common training problems.
🎬 Happy training! May your loss curves trend down and your VRAM never run out.