Trainer Quick Start

Get up and running with LTX-2 training in a few steps.

New to training? Start with the agent. You don’t have to run these steps by hand — open the repo in Claude Code and run /train-model. It makes the same decisions described below and explains each step as it goes, pausing for your approval before any heavy work. See the train-model skill for the full phase-by-phase reference.

Prerequisites

Before you begin, ensure you have:

  1. LTX-2 model checkpoint — a local .safetensors file with the model weights. Download ltx-2.3-22b-dev.safetensors from the LTX-2.3 collection on HuggingFace.
  2. Gemma text encoder — a local directory with the Gemma model (required for LTX-2). Download from HuggingFace.
  3. Linux with CUDA — the trainer requires triton, which is Linux-only.
  4. A GPU with enough VRAM — 80GB recommended for the standard config.

For 32GB GPUs (e.g. RTX 5090), use the low-VRAM config, which enables INT8 quantization and other memory optimizations.

Installation

First install uv if you haven’t already, then clone the repository:

$git clone https://github.com/Lightricks/LTX-2

The ltx-trainer package is part of the LTX-2 monorepo. Install dependencies from the repository root, then move into the trainer package:

$# From the repository root
$uv sync
$cd packages/ltx-trainer

The trainer depends on the ltx-core and ltx-pipelines packages, which are installed automatically from the monorepo.

Training Workflow

1. Prepare your dataset

Organize your videos with captions, then preprocess them into cached latents and text embeddings:

$uv run python scripts/process_dataset.py dataset.json \
> --resolution-buckets "960x544x49" \
> --model-path /path/to/ltx-2-model.safetensors \
> --text-encoder-path /path/to/gemma-model

Audio latents are extracted from your videos automatically. Optional scene splitting (split_scenes.py) and automatic captioning (caption_videos.py) are also available. For the full preprocessing workflow — dataset format, captioning setup, resolution buckets, masks, and all CLI options — see the Dataset Preparation guide on GitHub.

2. Configure training

Create or modify a configuration YAML file. Start from one of the example configs:

Key settings to update:

1model:
2 model_path: "/path/to/ltx-2-model.safetensors"
3 text_encoder_path: "/path/to/gemma-model"
4
5data:
6 preprocessed_data_root: "/path/to/preprocessed/data"
7
8output_dir: "outputs/my_training_run"

See the Configuration Reference for all available options.

3. Start training

$uv run python scripts/train.py configs/t2v_lora.yaml

For multi-GPU training:

$uv run accelerate launch scripts/train.py configs/t2v_lora.yaml

See the Training Guide for distributed training (DDP/FSDP), HuggingFace Hub uploads, and Weights & Biases logging.

Training Modes

First time? Start with t2v_lora.yaml — it’s the simplest mode and only requires videos with captions. Explore other modes once you’ve confirmed your setup works.

All modes are expressed through the single flexible training strategy. The trainer supports:

ModeDescriptionExample Config
Text-to-VideoGenerate video+audio from text promptst2v_lora.yaml
Image-to-VideoAnimate from a starting imagei2v_lora.yaml
Video ExtensionExtend videos temporally (forward/backward)video_extend_lora.yaml
IC-LoRA (V2V)Video-to-video transformationsv2v_ic_lora.yaml
Audio-to-VideoGenerate video conditioned on audioa2v_lora.yaml
Video-to-AudioGenerate audio/foley from videov2a_lora.yaml
Video InpaintingFill in masked regions of videovideo_inpainting_lora.yaml
Video OutpaintingExtend video spatiallyvideo_outpainting_lora.yaml
Text-to-AudioGenerate audio from text promptst2a_lora.yaml
Audio ExtensionExtend audio temporallyaudio_extend_lora.yaml
Audio InpaintingFill in masked regions of audioaudio_inpainting_lora.yaml
IC-LoRA (A2A)Audio-to-audio transformationsa2a_ic_lora.yaml
AV2AV IC-LoRAAudio+video IC-LoRA transformationsav2av_ic_lora.yaml
Full Fine-tuningFull model training (any mode above)Set model.training_mode: "full"

See Training Modes for detailed explanations of each mode.

Reference docs on GitHub

🎬 Happy training! May your loss curves trend down and your VRAM never run out.