LTX-2 Training

The LTX-2 Trainer exposes the same toolkit we use to train production LTX-2 models — fine-tune LTX-2 on your own data, from a quick style LoRA to full multimodal fine-tuning. The train-model agent makes it approachable even without ML expertise, while experts keep full manual control over every parameter. It’s built around a single conditioning framework that covers every training scenario.

Start with the agent. Describe what you want and /train-model trains it for you — run it in Claude Code, or follow the Quick Start to train by hand.

What You Can Train

One strategy, every mode. A single flexible conditioning framework expresses 13+ training modes — switch between them by editing a few lines of training config (set which modality is generated and compose conditions), rather than picking a separate strategy or writing code. Text- and image-to-video, video and audio extension, inpainting and outpainting, audio-to-video, video-to-audio (Foley), text-to-audio, and in-context (IC-LoRA) transformations are all driven by the same config block.

Joint audio + video. Train both modalities together through the model’s cross-modal attention, or freeze one to condition the other (e.g. generate Foley from a fixed video, or video from fixed audio).

In-context control (IC-LoRA). Learn transformations from paired videos or audio — depth and pose control, style transfer, deblurring, colorization, and more.

LoRA or full fine-tuning. Train lightweight, portable LoRA adapters, or update all model parameters with distributed FSDP for larger adaptations.

Runs on accessible hardware. 80GB VRAM is recommended, but a low-VRAM path (INT8 quantization, 8-bit optimizer, reduced rank) brings LoRA training to 32GB consumer GPUs such as the RTX 5090.

Existing LoRAs and IC-LoRAs trained on previous versions of the trainer do not need to be retrained.

Start Here