LTX-2 ComfyUI Nodes
This page documents ComfyUI nodes released for LTX-2. These nodes address specific workflow pain points: prompt iteration speed, quality stability, advanced guidance control, and IC-LoRA strength tuning. All nodes are designed to be optional drop-ins that enhance existing workflows without requiring complete restructuring.
Overview
Gemma Text Encoding:
- GemmaAPITextEncode - Free API-based text encoder that replaces the local Gemma and allows for reduced VRAM usage and faster runtimes
- LTXVSaveConditioning - Save text encodings to disk for reuse
- LTXVLoadConditioning - Load pre-saved text encodings
Advanced Guidance:
- MultimodalGuider - Independent control over audio and video guidance parameters
- LTX Add Video IC-LoRA Guide Advanced - Granular IC-LoRA strength control with global scaling and spatial masking
Quality Enhancement:
- LTXVNormalizingSampler - Latent normalization to prevent overbaking and audio clipping
Gemma Text Encoding Nodes
GemmaAPITextEncode
Location: gemma_api_conditioning.py
What it does
Encodes text prompts using Lightricks’ free API endpoint, bypassing the need to load Gemma locally. This eliminates all local VRAM usage for text encoding and enables sub-second prompt encoding.
Why use it
Gemma’s large memory footprint (requires loading/unloading from VRAM) can create a bottleneck on consumer hardware, particualry during prompt iteration. Every time you change a prompt, Gemma must be reloaded, adding significant time to the workflow. This node solves that problem by offloading text encoding to a free API endpoint.
When to use
Use this node when:
- Working on consumer GPUs with limited VRAM
- Multiple generations use different prompts
Parameters
api_key- Your LTX API keyprompt- The text prompt to encodeckpt_name- The LTX-2 checkpoint file (used to extract model ID for encoding compatibility)
Returns
conditioning- Encoded prompt conditioning ready for LTX-2 generation
Getting an API key
- Visit console.ltx.video
- Sign up or log in
- Generate a free API key
- Copy the key into the node’s
api_keyparameter
Example workflow
With API:
Without API:
LTXVSaveConditioning
Location: conditioning_saver.py
What it does
Saves computed text conditioning to disk as a .safetensors file, allowing you to reuse the exact same conditioning across multiple workflow sessions without re-encoding.
Why use it
Useful when:
- You have a prompt that works well and want to preserve its exact encoding
- Running batch generations with identical conditioning
- Building reusable workflow templates with pre-encoded prompts
- Working offline without API access
When to use
Use this node when:
- You want to lock in a specific prompt’s encoding
- Multiple workflow sessions will use the same conditioning
- You need reproducible conditioning across different machines
- Building libraries of validated prompts
Parameters
conditioning- The conditioning to save (from any text encoder or the API node)filename- Base filename (without extension)dtype- Precision for storage: “bfloat16” or “float16”
Returns
- UI notification showing saved filename and file size
Output location
Files are saved to: ComfyUI/models/embeddings/
Storage
Files are stored as .safetensors using the selected numerical precision:
- bfloat16: Higher precision, more commonly used
- float16: Alternative representation, minimal practical difference
LTXVLoadConditioning
Location: conditioning_loader.py
What it does
Loads previously saved conditioning from disk, bypassing text encoding entirely.
Why use it
Works in tandem with LTXVSaveConditioning to enable instant conditioning loading. Perfect for workflows that reuse the same prompts or when you need guaranteed consistency.
When to use
Use this node when:
- Reusing conditioning saved in previous sessions
- Running batch workflows with preset prompts
- Working offline without API access
- You need bit-perfect conditioning reproducibility
Parameters
file_name- The .safetensors file to load (from the embeddings folder)device- Where to load the conditioning: “cpu” or “gpu”
Returns
conditioning- Loaded conditioning ready for generation
Device selection
- cpu: Loads to system RAM (slower but works on any system)
- gpu: Loads directly to VRAM if available (faster for generation)
Workflow integration
Pair with LTXVSaveConditioning to create prompt libraries:
- Create and refine prompts with text encoder or API
- Save successful conditioning with LTXVSaveConditioning
- Load instantly in future sessions with LTXVLoadConditioning
Advanced Model Guidance
MultimodalGuider
Location: multimodal_guider.py
What it does
Provides independent, per-modality control over guidance parameters for audio and video. This is an extension of Classifier-Free Guidance (CFG) that allows you to separately control prompt adherence, artifact reduction, and cross-modal synchronization for each modality.
Why use it
Standard guidance treats audio and video as a single unit. When you increase guidance to improve video quality, it affects audio synchronization. When you fix synchronization, your visual style can break. The MultimodalGuider decouples these controls, letting you tune video guidance independently from audio guidance without trade-offs.
When to use
Use this node when:
- You need different guidance strengths for audio vs video
- Video quality needs to be prioritized over tight audio sync (or vice versa)
- You want to prevent the common issue where fixing synchronization breaks visual style
- You need fine-grained control over cross-modal attention
How it works
The guider can make up to four separate model inference calls per step:
- Positive conditioning - Your prompt
- Negative conditioning - Your negative prompt (for CFG)
- Perturbed conditioning - Degraded version (for STG artifact reduction)
- Modality-isolated conditioning - Each modality without cross-attention (for sync control)
By combining these strategically, you get independent control over:
- CFG strength per modality (prompt adherence)
- STG strength per modality (artifact reduction)
- Cross-modal attention strength (synchronization tightness)
- Step skipping per modality (performance optimization)
Parameters
model- The LTX-2 model to apply guidance topositive- Positive conditioningnegative- Negative conditioningparameters- A GUIDER_PARAMETERS object containing per-modality settingsskip_blocks- Comma-separated list of transformer blocks to skip for STG
GUIDER_PARAMETERS structure
The parameters object exposes three independent guidance controls, each configurable per modality (audio and video separately):
1. CFG Guidance (cfg > 1)
Controls prompt adherence and semantic accuracy. Pushes the model toward the positive prompt and away from the negative prompt.
- When to increase: When visual style or object fidelity matters most
- Effect: Stronger prompt following, more accurate semantic content
- Configurable per modality: Yes
2. Spatio-Temporal Guidance (stg > 0)
Reduces artifacts by pushing the model away from a degraded, perturbed version of itself. Prevents breakup of rigid objects. Based on the STG technique.
- When to increase: If you see structural artifacts or object breakup
- Effect: Fewer visual artifacts, more stable structures
- Configurable per modality: Yes
3. Cross-Modal Guidance (modality_scale > 1)
Controls synchronization between audio and video. Pushes the model away from versions where modalities ignore each other.
- When to adjust: To balance synchronization versus natural motion
- Higher values: Tighter alignment (perfect for lip-sync or rhythmic action)
- Lower values: Looser, more natural coupling
- Configurable per modality: Yes
Additional Per-Modality Parameters
-
skip_step- Periodically skip diffusion steps for this modality0: No skipping1: Skip every other step2: Skip two out of every three steps- Use for performance optimization
-
rescale- Normalization after applying CFG, STG, and cross-modal guidance0: No normalization1: Full renormalization to match the norm of the positive-prompt prediction0-1: Partial normalization- Especially helpful for preventing oversaturation when using high CFG or STG values
-
perturb_attn- Boolean controlling whether the perturbed model is perturbed for this modality during STG. Normally set toTrue. -
cross_attn- Boolean controlling whether cross-attention layers from this modality to the other modality are active. Normally set toTrue.
Returns
guider- Configured guider ready for sampling
Use cases
Use Case 1: Prioritize video quality, loose audio sync
- Video: High CFG, moderate STG, low modality scale
- Audio: Low CFG, low STG, low modality scale
- Result: Beautiful video, audio follows general mood but not frame-locked
Use Case 2: Tight lip-sync for dialogue
- Video: Moderate CFG, moderate STG, high modality scale
- Audio: Moderate CFG, low STG, high modality scale
- Result: Audio and video tightly synchronized, good for speaking
Use Case 3: Performance optimization
- Video: Process every step
- Audio: Skip every other step (skip_step = 1)
- Result: 2x faster generation with minimal audio quality impact
Integration with other nodes
- Works with all LTX-2 sampler nodes
- Can be combined with latent normalization for additional quality control
- Essential for looping sampler workflows
LTX Add Video IC-LoRA Guide Advanced
What it does
Applies an IC-LoRA control adapter with granular strength control, replacing the fixed 1.0 strength behavior of the standard IC-LoRA node. Allows global strength adjustment and optional spatial/spatiotemporal masking.
Why use it
The standard IC-LoRA workflow applies control at full strength everywhere, which can over-constrain generation. This node lets you dial in exactly how much influence the control signal has, and where.
When to use
Use this node when:
- You want softer, less rigid IC-LoRA control
- You need IC-LoRA to apply only to specific regions of the frame
- You want to blend IC-LoRA control with free generation
- You’re combining multiple control types and need to balance their influence
Parameters
attention_strength(float, 0.0-1.0) — Global scaling factor for IC-LoRA cross-attention scores. Default: 1.0attention_mask(MASK, optional) — Spatial (H×W) or spatiotemporal (T×H×W) mask multiplied with attention_strength
Returns
- Model with IC-LoRA applied at the specified strength/mask configuration
Quality Enhancement
LTXVNormalizingSampler
Location: easy_samplers.py
What it does
A specialized sampler that applies statistical normalization to latents during generation to prevent overbaking (oversaturation) and audio clipping issues.
Why use it
Without normalization, latent values can drift into problematic ranges during the denoising process. This causes:
- Oversaturated, “overbaked” visual outputs with crushed colors
- Audio clipping and distortion
- Inconsistent quality across different prompts or settings
The NormalizingSampler keeps latent statistics in optimal ranges throughout generation, dramatically improving output quality.
When to use
Use this node when:
- You see oversaturated, “overbaked” visual outputs
- Audio has clipping or distortion artifacts
- Output quality varies unpredictably between generations
- Using high guidance values that tend to cause oversaturation
How it works
The sampler monitors latent statistics during the denoising process and applies normalization to keep values within target ranges. This is done using percentile-based statistics (excluding extreme outliers) to prevent both overbaking and excessive normalization.
Key benefits
- Prevents oversaturated, “overbaked” visual outputs
- Eliminates audio clipping artifacts
- More consistent quality across generations
- Works automatically - no manual tuning required
- Especially effective with high guidance values
Integration
This is a drop-in replacement for standard samplers in LTX-2 workflows. It maintains full compatibility with:
- All guider nodes (including MultimodalGuider)
- Text and image conditioning
- LoRA and IC-LoRA workflows
Performance impact
Minimal - the normalization adds negligible computational overhead while significantly improving output quality.
Installation & Usage
Installation
All nodes are available in the ComfyUI-LTXVideo repository.
Via ComfyUI Manager (Recommended):
- Open ComfyUI Manager
- Search for “ComfyUI-LTXVideo”
- Click Update (if already installed) or Install
- Restart ComfyUI
Manual Update:
After installation/update, restart ComfyUI. The new nodes will appear under the “Lightricks” category.
Troubleshooting
GemmaAPITextEncode
“Invalid API key” error
- Verify your API key is correct
- Regenerate a new key at console.ltx.video
- Ensure no extra spaces in the API key field
“Model ID cannot be identified” error
- Your checkpoint file may be missing metadata
- Ensure you’re using an official LTX-2 model
- Ensure the ckpt_name field in the node matches the filename of the model loaded in your Checkpoint Loader
Timeout errors
- Check your internet connection
- The API may be experiencing high load
LTXVSaveConditioning / LTXVLoadConditioning
File not found
- Ensure .safetensors extension is not doubled
- Check that files are in
ComfyUI/models/embeddings/ - Ensure you are not mixing up the
models/embeddings/folder withmodels/text_encoders/
Out of memory when loading
- CPU / GPU memory management is key to avoiding OOM errors
MultimodalGuider
No quality improvement vs standard guider
- Ensure you have connected two GuiderParameters nodes (one for audio, one for video)
- High values can break the generation. Suggested baseline for balanced speed and consistency is Modality: 1 and Skip Step: 1
Generation is slower than expected
- NOTE: This node uses CFG > 1, which inherently makes generation slower
- Use Skip Step: 1 for increased speed, reduce this value if artifacts appear
LTXVNormalizingSampler
Still seeing overbaking
- This is a sampler, not a post-process — ensure you have swapped out the SamplerCustomAdvanced node
- Try combining with lower guidance values
- Consider if your prompt or conditioning is the root cause
Quality seems worse
- Use this node ONLY for the first sampling stage. Revert to a standard sampler for the second/upscale stage
- Do not use this sampler for inpainting, video extension, or any workflow using masks. It may break the context audio
- Ensure you are using the Distilled model with the standard 8-step manual sigma schedule. This node is NOT tuned for the full model
- Normalization helps most with problematic outputs (clipping/saturation). If your generation is already clean, this node may introduce unnecessary noise. Test side-by-side with the same seed