LTX-2 ComfyUI Nodes
LTX-2 ComfyUI Nodes
LTX-2 ComfyUI Nodes
This page documents ComfyUI nodes released for LTX-2. These nodes address specific workflow pain points, such as prompt iteration speed, quality stability, advanced guidance control, and IC-LoRA strength tuning. All nodes are designed to be optional drop-ins that enhance existing workflows.
Gemma Text Encoding:
Audio Identity (LipDub):
Advanced Guidance:
Quality Enhancement:
Location: gemma_api_conditioning.py
What it does
Encodes text prompts using Lightricks’ free API endpoint, bypassing the need to load Gemma locally. This eliminates all local VRAM usage for text encoding and enables sub-second prompt encoding.
Why use it
Gemma’s large memory footprint (requires loading/unloading from VRAM) can create a bottleneck on consumer hardware, particualry during prompt iteration. Every time you change a prompt, Gemma must be reloaded, adding significant time to the workflow. This node solves that problem by offloading text encoding to a free API endpoint.
When to use
Use this node when:
Parameters
api_key - Your LTX API keyprompt - The text prompt to encodeckpt_name - The LTX-2 checkpoint file (used to extract model ID for encoding compatibility)Returns
conditioning - Encoded prompt conditioning ready for LTX-2 generationGetting an API key
api_key parameterExample workflow
With API:
Without API:
Location: conditioning_saver.py
What it does
Saves computed text conditioning to disk as a .safetensors file, allowing you to reuse the exact same conditioning across multiple workflow sessions without re-encoding.
Why use it
Useful when:
When to use
Use this node when:
Parameters
conditioning - The conditioning to save (from any text encoder or the API node)filename - Base filename (without extension)dtype - Precision for storage: “bfloat16” or “float16”Returns
Output location
Files are saved to: ComfyUI/models/embeddings/
Storage
Files are stored as .safetensors using the selected numerical precision:
Location: conditioning_loader.py
What it does
Loads previously saved conditioning from disk, bypassing text encoding entirely.
Why use it
Works in tandem with LTXVSaveConditioning to enable instant conditioning loading. Perfect for workflows that reuse the same prompts or when you need guaranteed consistency.
When to use
Use this node when:
Parameters
file_name - The .safetensors file to load (from the embeddings folder)device - Where to load the conditioning: “cpu” or “gpu”Returns
conditioning - Loaded conditioning ready for generationDevice selection
Workflow integration
Pair with LTXVSaveConditioning to create prompt libraries:
Location: multimodal_guider.py
What it does
Provides independent, per-modality control over guidance parameters for audio and video. This is an extension of Classifier-Free Guidance (CFG) that allows you to separately control prompt adherence, artifact reduction, and cross-modal synchronization for each modality.
Why use it
Standard guidance treats audio and video as a single unit. When you increase guidance to improve video quality, it affects audio synchronization. When you fix synchronization, your visual style can break. The MultimodalGuider decouples these controls, letting you tune video guidance independently from audio guidance without trade-offs.
When to use
Use this node when:
How it works
The guider can make up to four separate model inference calls per step:
By combining these strategically, you get independent control over:
Parameters
model - The LTX-2 model to apply guidance topositive - Positive conditioningnegative - Negative conditioningparameters - A GUIDER_PARAMETERS object containing per-modality settingsskip_blocks - Comma-separated list of transformer blocks to skip for STGGUIDER_PARAMETERS structure
The parameters object exposes three independent guidance controls, each configurable per modality (audio and video separately):
1. CFG Guidance (cfg > 1)
Controls prompt adherence and semantic accuracy. Pushes the model toward the positive prompt and away from the negative prompt.
2. Spatio-Temporal Guidance (stg > 0)
Reduces artifacts by pushing the model away from a degraded, perturbed version of itself. Prevents breakup of rigid objects. Based on the STG technique.
3. Cross-Modal Guidance (modality_scale > 1)
Controls synchronization between audio and video. Pushes the model away from versions where modalities ignore each other.
Additional Per-Modality Parameters
skip_step - Periodically skip diffusion steps for this modality
0: No skipping1: Skip every other step2: Skip two out of every three stepsrescale - Normalization after applying CFG, STG, and cross-modal guidance
0: No normalization1: Full renormalization to match the norm of the positive-prompt prediction0-1: Partial normalizationperturb_attn - Boolean controlling whether the perturbed model is perturbed for this modality during STG. Normally set to True.
cross_attn - Boolean controlling whether cross-attention layers from this modality to the other modality are active. Normally set to True.
Returns
guider - Configured guider ready for samplingUse cases
Use Case 1: Prioritize video quality, loose audio sync
Use Case 2: Tight lip-sync for dialogue
Use Case 3: Performance optimization
Integration with other nodes
Location: iclora.py
What it does
Applies an IC-LoRA control adapter with granular strength control, replacing the fixed 1.0 strength behavior of the standard IC-LoRA node. Allows global strength adjustment and optional spatial/spatiotemporal masking.
Why use it
The standard IC-LoRA workflow applies control at full strength everywhere, which can over-constrain generation. This node lets you dial in exactly how much influence the control signal has, and where.
When to use
Use this node when:
Parameters
attention_strength (float, 0.0-1.0) — Global scaling factor for IC-LoRA cross-attention scores. Default: 1.0attention_mask (MASK, optional) — Spatial (H×W) or spatiotemporal (T×H×W) mask multiplied with attention_strengthReturns
Location: easy_samplers.py
What it does
A specialized sampler that applies statistical normalization to latents during generation to prevent overbaking (oversaturation) and audio clipping issues.
Why use it
Without normalization, latent values can drift into problematic ranges during the denoising process. This causes:
The NormalizingSampler keeps latent statistics in optimal ranges throughout generation, dramatically improving output quality.
When to use
Use this node when:
How it works
The sampler monitors latent statistics during the denoising process and applies normalization to keep values within target ranges. This is done using percentile-based statistics (excluding extreme outliers) to prevent both overbaking and excessive normalization.
Key benefits
Integration
This is a drop-in replacement for standard samplers in LTX-2 workflows. It maintains full compatibility with:
Performance impact
Minimal - the normalization adds negligible computational overhead while significantly improving output quality.
This node provides speaker identity conditioning for the LipDub IC-LoRA two-stage dubbing pipeline.
Location: iclora.py
What it does
Attaches an audio latent as ref_audio tokens on conditioning for speaker identity transfer. Also outputs a frozen_audio copy with noise_mask=0, ensuring Stage 1 audio passes through Stage 2 unchanged without needing a mask-by-time node.
Why use it
The LipDub pipeline needs to preserve the original speaker’s voice across both generation stages. This node handles two things in one step: it gives the model the speaker’s audio identity as reference tokens, and it freezes the audio latent so it carries forward unchanged into Stage 2.
When to use
Used once per stage in the LipDub two-stage pipeline. Stage 1 receives the VAE-encoded reference audio; Stage 2 receives the Stage 1 audio output.
Parameters
conditioning - The text conditioning to attach reference tokens toaudio_latent - The audio latent to use as reference (from VAE encode in Stage 1, or from Stage 1 output in Stage 2)Returns
conditioning - Conditioning with ref_audio tokens prependedfrozen_audio - Audio latent with zero noise mask for pass-throughAll nodes are available in the ComfyUI-LTXVideo repository.
Via ComfyUI Manager (Recommended):
Manual Update:
After installation/update, restart ComfyUI. The new nodes will appear under the “Lightricks” category.
“Invalid API key” error
“Model ID cannot be identified” error
Timeout errors
File not found
ComfyUI/models/embeddings/models/embeddings/ folder with models/text_encoders/Out of memory when loading
No quality improvement vs standard guider
Generation is slower than expected
Still seeing overbaking
Quality seems worse