LTX-2 ComfyUI Nodes

This page documents ComfyUI nodes released for LTX-2. These nodes address specific workflow pain points, such as prompt iteration speed, quality stability, advanced guidance control, and IC-LoRA strength tuning. All nodes are designed to be optional drop-ins that enhance existing workflows.

Overview

Gemma Text Encoding:

GemmaAPITextEncode - Free API-based text encoder that replaces the local Gemma and allows for reduced VRAM usage and faster runtimes
LTXVSaveConditioning - Save text encodings to disk for reuse
LTXVLoadConditioning - Load pre-saved text encodings

Audio Identity (LipDub):

LTXVSetAudioRefTokens - Attaches reference audio as conditioning tokens for speaker identity transfer

Advanced Guidance:

MultimodalGuider - Independent control over audio and video guidance parameters
LTX Add Video IC-LoRA Guide Advanced - Granular IC-LoRA strength control with global scaling and spatial masking

Quality Enhancement:

LTXVNormalizingSampler - Latent normalization to prevent overbaking and audio clipping

Gemma Text Encoding Nodes

GemmaAPITextEncode

Location: gemma_api_conditioning.py

What it does

Encodes text prompts using Lightricks’ free API endpoint, bypassing the need to load Gemma locally. This eliminates all local VRAM usage for text encoding and enables sub-second prompt encoding.

Why use it

Gemma’s large memory footprint (requires loading/unloading from VRAM) can create a bottleneck on consumer hardware, particualry during prompt iteration. Every time you change a prompt, Gemma must be reloaded, adding significant time to the workflow. This node solves that problem by offloading text encoding to a free API endpoint.

When to use

Use this node when:

Working on consumer GPUs with limited VRAM
Multiple generations use different prompts

Parameters

api_key - Your LTX API key
prompt - The text prompt to encode
ckpt_name - The LTX-2 checkpoint file (used to extract model ID for encoding compatibility)

Returns

conditioning - Encoded prompt conditioning ready for LTX-2 generation

Getting an API key

Visit console.ltx.video
Sign up or log in
Generate a free API key
Copy the key into the node’s api_key parameter

Example workflow

With API:

Encode via API → Generate → Change Prompt → Encode via API → Generate

Without API:

Load Gemma → Encode Prompt → Generate → Change Prompt → Reload Gemma → Encode → Generate

LTXVSaveConditioning

Location: conditioning_saver.py

What it does

Saves computed text conditioning to disk as a .safetensors file, allowing you to reuse the exact same conditioning across multiple workflow sessions without re-encoding.

Why use it

Useful when:

You have a prompt that works well and want to preserve its exact encoding
Running batch generations with identical conditioning
Building reusable workflow templates with pre-encoded prompts
Working offline without API access

When to use

Use this node when:

You want to lock in a specific prompt’s encoding
Multiple workflow sessions will use the same conditioning
You need reproducible conditioning across different machines
Building libraries of validated prompts

Parameters

conditioning - The conditioning to save (from any text encoder or the API node)
filename - Base filename (without extension)
dtype - Precision for storage: “bfloat16” or “float16”

Returns

UI notification showing saved filename and file size

Output location

Files are saved to: ComfyUI/models/embeddings/

Storage

Files are stored as .safetensors using the selected numerical precision:

bfloat16: Higher precision, more commonly used
float16: Alternative representation, minimal practical difference

LTXVLoadConditioning

Location: conditioning_loader.py

What it does

Loads previously saved conditioning from disk, bypassing text encoding entirely.

Why use it

Works in tandem with LTXVSaveConditioning to enable instant conditioning loading. Perfect for workflows that reuse the same prompts or when you need guaranteed consistency.

When to use

Use this node when:

Reusing conditioning saved in previous sessions
Running batch workflows with preset prompts
Working offline without API access
You need bit-perfect conditioning reproducibility

Parameters

file_name - The .safetensors file to load (from the embeddings folder)
device - Where to load the conditioning: “cpu” or “gpu”

Returns

conditioning - Loaded conditioning ready for generation

Device selection

cpu: Loads to system RAM (slower but works on any system)
gpu: Loads directly to VRAM if available (faster for generation)

Workflow integration

Pair with LTXVSaveConditioning to create prompt libraries:

Create and refine prompts with text encoder or API
Save successful conditioning with LTXVSaveConditioning
Load instantly in future sessions with LTXVLoadConditioning

Advanced Model Guidance

MultimodalGuider

Location: multimodal_guider.py

What it does

Provides independent, per-modality control over guidance parameters for audio and video. This is an extension of Classifier-Free Guidance (CFG) that allows you to separately control prompt adherence, artifact reduction, and cross-modal synchronization for each modality.

Why use it

Standard guidance treats audio and video as a single unit. When you increase guidance to improve video quality, it affects audio synchronization. When you fix synchronization, your visual style can break. The MultimodalGuider decouples these controls, letting you tune video guidance independently from audio guidance without trade-offs.

When to use

Use this node when:

You need different guidance strengths for audio vs video
Video quality needs to be prioritized over tight audio sync (or vice versa)
You want to prevent the common issue where fixing synchronization breaks visual style
You need fine-grained control over cross-modal attention

How it works

The guider can make up to four separate model inference calls per step:

Positive conditioning - Your prompt
Negative conditioning - Your negative prompt (for CFG)
Perturbed conditioning - Degraded version (for STG artifact reduction)
Modality-isolated conditioning - Each modality without cross-attention (for sync control)

By combining these strategically, you get independent control over:

CFG strength per modality (prompt adherence)
STG strength per modality (artifact reduction)
Cross-modal attention strength (synchronization tightness)
Step skipping per modality (performance optimization)

Parameters

model - The LTX-2 model to apply guidance to
positive - Positive conditioning
negative - Negative conditioning
parameters - A GUIDER_PARAMETERS object containing per-modality settings
skip_blocks - Comma-separated list of transformer blocks to skip for STG

GUIDER_PARAMETERS structure

The parameters object exposes three independent guidance controls, each configurable per modality (audio and video separately):

1. CFG Guidance (cfg > 1)

Controls prompt adherence and semantic accuracy. Pushes the model toward the positive prompt and away from the negative prompt.

When to increase: When visual style or object fidelity matters most
Effect: Stronger prompt following, more accurate semantic content
Configurable per modality: Yes

2. Spatio-Temporal Guidance (stg > 0)

Reduces artifacts by pushing the model away from a degraded, perturbed version of itself. Prevents breakup of rigid objects. Based on the STG technique.

When to increase: If you see structural artifacts or object breakup
Effect: Fewer visual artifacts, more stable structures
Configurable per modality: Yes

3. Cross-Modal Guidance (modality_scale > 1)

Controls synchronization between audio and video. Pushes the model away from versions where modalities ignore each other.

When to adjust: To balance synchronization versus natural motion
Higher values: Tighter alignment (perfect for lip-sync or rhythmic action)
Lower values: Looser, more natural coupling
Configurable per modality: Yes

Additional Per-Modality Parameters

skip_step - Periodically skip diffusion steps for this modality
- 0: No skipping
- 1: Skip every other step
- 2: Skip two out of every three steps
- Use for performance optimization
rescale - Normalization after applying CFG, STG, and cross-modal guidance
- 0: No normalization
- 1: Full renormalization to match the norm of the positive-prompt prediction
- 0-1: Partial normalization
- Especially helpful for preventing oversaturation when using high CFG or STG values
perturb_attn - Boolean controlling whether the perturbed model is perturbed for this modality during STG. Normally set to True.
cross_attn - Boolean controlling whether cross-attention layers from this modality to the other modality are active. Normally set to True.

Returns

guider - Configured guider ready for sampling

Use cases

Use Case 1: Prioritize video quality, loose audio sync

Video: High CFG, moderate STG, low modality scale
Audio: Low CFG, low STG, low modality scale
Result: Beautiful video, audio follows general mood but not frame-locked

Use Case 2: Tight lip-sync for dialogue

Video: Moderate CFG, moderate STG, high modality scale
Audio: Moderate CFG, low STG, high modality scale
Result: Audio and video tightly synchronized, good for speaking

Use Case 3: Performance optimization

Video: Process every step
Audio: Skip every other step (skip_step = 1)
Result: 2x faster generation with minimal audio quality impact

Integration with other nodes

Works with all LTX-2 sampler nodes
Can be combined with latent normalization for additional quality control
Essential for looping sampler workflows

LTX Add Video IC-LoRA Guide Advanced

Location: iclora.py

What it does

Applies an IC-LoRA control adapter with granular strength control, replacing the fixed 1.0 strength behavior of the standard IC-LoRA node. Allows global strength adjustment and optional spatial/spatiotemporal masking.

Why use it

The standard IC-LoRA workflow applies control at full strength everywhere, which can over-constrain generation. This node lets you dial in exactly how much influence the control signal has, and where.

When to use

Use this node when:

You want softer, less rigid IC-LoRA control
You need IC-LoRA to apply only to specific regions of the frame
You want to blend IC-LoRA control with free generation
You’re combining multiple control types and need to balance their influence

Parameters

attention_strength (float, 0.0-1.0) — Global scaling factor for IC-LoRA cross-attention scores. Default: 1.0
attention_mask (MASK, optional) — Spatial (H×W) or spatiotemporal (T×H×W) mask multiplied with attention_strength

Returns

Model with IC-LoRA applied at the specified strength/mask configuration

Quality Enhancement

LTXVNormalizingSampler

Location: easy_samplers.py

What it does

A specialized sampler that applies statistical normalization to latents during generation to prevent overbaking (oversaturation) and audio clipping issues.

Why use it

Without normalization, latent values can drift into problematic ranges during the denoising process. This causes:

Oversaturated, “overbaked” visual outputs with crushed colors
Audio clipping and distortion
Inconsistent quality across different prompts or settings

The NormalizingSampler keeps latent statistics in optimal ranges throughout generation, dramatically improving output quality.

When to use

Use this node when:

You see oversaturated, “overbaked” visual outputs
Audio has clipping or distortion artifacts
Output quality varies unpredictably between generations
Using high guidance values that tend to cause oversaturation

How it works

The sampler monitors latent statistics during the denoising process and applies normalization to keep values within target ranges. This is done using percentile-based statistics (excluding extreme outliers) to prevent both overbaking and excessive normalization.

Key benefits

Prevents oversaturated, “overbaked” visual outputs
Eliminates audio clipping artifacts
More consistent quality across generations
Works automatically - no manual tuning required
Especially effective with high guidance values

Integration

This is a drop-in replacement for standard samplers in LTX-2 workflows. It maintains full compatibility with:

All guider nodes (including MultimodalGuider)
Text and image conditioning
LoRA and IC-LoRA workflows

Performance impact

Minimal - the normalization adds negligible computational overhead while significantly improving output quality.

Audio Identity

This node provides speaker identity conditioning for the LipDub IC-LoRA two-stage dubbing pipeline.

LTXVSetAudioRefTokens

Location: iclora.py

What it does

Attaches an audio latent as ref_audio tokens on conditioning for speaker identity transfer. Also outputs a frozen_audio copy with noise_mask=0, ensuring Stage 1 audio passes through Stage 2 unchanged without needing a mask-by-time node.

Why use it

The LipDub pipeline needs to preserve the original speaker’s voice across both generation stages. This node handles two things in one step: it gives the model the speaker’s audio identity as reference tokens, and it freezes the audio latent so it carries forward unchanged into Stage 2.

When to use

Used once per stage in the LipDub two-stage pipeline. Stage 1 receives the VAE-encoded reference audio; Stage 2 receives the Stage 1 audio output.

Parameters

conditioning - The text conditioning to attach reference tokens to
audio_latent - The audio latent to use as reference (from VAE encode in Stage 1, or from Stage 1 output in Stage 2)

Returns

conditioning - Conditioning with ref_audio tokens prepended
frozen_audio - Audio latent with zero noise mask for pass-through

Installation & Usage

Installation

All nodes are available in the ComfyUI-LTXVideo repository.

Via ComfyUI Manager (Recommended):

Open ComfyUI Manager
Search for “ComfyUI-LTXVideo”
Click Update (if already installed) or Install
Restart ComfyUI

Manual Update:

$ cd ComfyUI/custom_nodes/ComfyUI-LTXVideo
$ git pull origin master
$ pip install -r requirements.txt

After installation/update, restart ComfyUI. The new nodes will appear under the “Lightricks” category.

Troubleshooting

GemmaAPITextEncode

“Invalid API key” error

Verify your API key is correct
Regenerate a new key at console.ltx.video
Ensure no extra spaces in the API key field

“Model ID cannot be identified” error

Your checkpoint file may be missing metadata
Ensure you’re using an official LTX-2 model
Ensure the ckpt_name field in the node matches the filename of the model loaded in your Checkpoint Loader

Timeout errors

Check your internet connection
The API may be experiencing high load

LTXVSaveConditioning / LTXVLoadConditioning

File not found

Ensure .safetensors extension is not doubled
Check that files are in ComfyUI/models/embeddings/
Ensure you are not mixing up the models/embeddings/ folder with models/text_encoders/

Out of memory when loading

CPU / GPU memory management is key to avoiding OOM errors

MultimodalGuider

No quality improvement vs standard guider

Ensure you have connected two GuiderParameters nodes (one for audio, one for video)
High values can break the generation. Suggested baseline for balanced speed and consistency is Modality: 1 and Skip Step: 1

Generation is slower than expected

NOTE: This node uses CFG > 1, which inherently makes generation slower
Use Skip Step: 1 for increased speed, reduce this value if artifacts appear

LTXVNormalizingSampler

Still seeing overbaking

This is a sampler, not a post-process — ensure you have swapped out the SamplerCustomAdvanced node
Try combining with lower guidance values
Consider if your prompt or conditioning is the root cause

Quality seems worse

Use this node ONLY for the first sampling stage. Revert to a standard sampler for the second/upscale stage
Do not use this sampler for inpainting, video extension, or any workflow using masks. It may break the context audio
Ensure you are using the Distilled model with the standard 8-step manual sigma schedule. This node is NOT tuned for the full model
Normalization helps most with problematic outputs (clipping/saturation). If your generation is already clean, this node may introduce unnecessary noise. Test side-by-side with the same seed

Overview

Gemma Text Encoding:

GemmaAPITextEncode - Free API-based text encoder that replaces the local Gemma and allows for reduced VRAM usage and faster runtimes
LTXVSaveConditioning - Save text encodings to disk for reuse
LTXVLoadConditioning - Load pre-saved text encodings

Audio Identity (LipDub):

LTXVSetAudioRefTokens - Attaches reference audio as conditioning tokens for speaker identity transfer

Advanced Guidance:

MultimodalGuider - Independent control over audio and video guidance parameters
LTX Add Video IC-LoRA Guide Advanced - Granular IC-LoRA strength control with global scaling and spatial masking

Quality Enhancement:

LTXVNormalizingSampler - Latent normalization to prevent overbaking and audio clipping

Gemma Text Encoding Nodes

GemmaAPITextEncode

Location: gemma_api_conditioning.py

What it does

Encodes text prompts using Lightricks’ free API endpoint, bypassing the need to load Gemma locally. This eliminates all local VRAM usage for text encoding and enables sub-second prompt encoding.

Why use it

When to use

Use this node when:

Working on consumer GPUs with limited VRAM
Multiple generations use different prompts

Parameters

api_key - Your LTX API key
prompt - The text prompt to encode
ckpt_name - The LTX-2 checkpoint file (used to extract model ID for encoding compatibility)

Returns

conditioning - Encoded prompt conditioning ready for LTX-2 generation

Getting an API key

Visit console.ltx.video
Sign up or log in
Generate a free API key
Copy the key into the node’s api_key parameter

Example workflow

With API:

Encode via API → Generate → Change Prompt → Encode via API → Generate

Without API:

Load Gemma → Encode Prompt → Generate → Change Prompt → Reload Gemma → Encode → Generate

LTXVSaveConditioning

Location: conditioning_saver.py

What it does

Saves computed text conditioning to disk as a .safetensors file, allowing you to reuse the exact same conditioning across multiple workflow sessions without re-encoding.

Why use it

Useful when:

You have a prompt that works well and want to preserve its exact encoding
Running batch generations with identical conditioning
Building reusable workflow templates with pre-encoded prompts
Working offline without API access

When to use

Use this node when:

You want to lock in a specific prompt’s encoding
Multiple workflow sessions will use the same conditioning
You need reproducible conditioning across different machines
Building libraries of validated prompts

Parameters

conditioning - The conditioning to save (from any text encoder or the API node)
filename - Base filename (without extension)
dtype - Precision for storage: “bfloat16” or “float16”

Returns

UI notification showing saved filename and file size

Output location

Files are saved to: ComfyUI/models/embeddings/

Storage

Files are stored as .safetensors using the selected numerical precision:

bfloat16: Higher precision, more commonly used
float16: Alternative representation, minimal practical difference

LTXVLoadConditioning

Location: conditioning_loader.py

What it does

Loads previously saved conditioning from disk, bypassing text encoding entirely.

Why use it

Works in tandem with LTXVSaveConditioning to enable instant conditioning loading. Perfect for workflows that reuse the same prompts or when you need guaranteed consistency.

When to use

Use this node when:

Reusing conditioning saved in previous sessions
Running batch workflows with preset prompts
Working offline without API access
You need bit-perfect conditioning reproducibility

Parameters

file_name - The .safetensors file to load (from the embeddings folder)
device - Where to load the conditioning: “cpu” or “gpu”

Returns

conditioning - Loaded conditioning ready for generation

Device selection

cpu: Loads to system RAM (slower but works on any system)
gpu: Loads directly to VRAM if available (faster for generation)

Workflow integration

Pair with LTXVSaveConditioning to create prompt libraries:

Create and refine prompts with text encoder or API
Save successful conditioning with LTXVSaveConditioning
Load instantly in future sessions with LTXVLoadConditioning

Advanced Model Guidance

MultimodalGuider

Location: multimodal_guider.py

What it does

Why use it

When to use

Use this node when:

You need different guidance strengths for audio vs video
Video quality needs to be prioritized over tight audio sync (or vice versa)
You want to prevent the common issue where fixing synchronization breaks visual style
You need fine-grained control over cross-modal attention

How it works

The guider can make up to four separate model inference calls per step:

Positive conditioning - Your prompt
Negative conditioning - Your negative prompt (for CFG)
Perturbed conditioning - Degraded version (for STG artifact reduction)
Modality-isolated conditioning - Each modality without cross-attention (for sync control)

By combining these strategically, you get independent control over:

CFG strength per modality (prompt adherence)
STG strength per modality (artifact reduction)
Cross-modal attention strength (synchronization tightness)
Step skipping per modality (performance optimization)

Parameters

model - The LTX-2 model to apply guidance to
positive - Positive conditioning
negative - Negative conditioning
parameters - A GUIDER_PARAMETERS object containing per-modality settings
skip_blocks - Comma-separated list of transformer blocks to skip for STG

GUIDER_PARAMETERS structure

The parameters object exposes three independent guidance controls, each configurable per modality (audio and video separately):

1. CFG Guidance (cfg > 1)

Controls prompt adherence and semantic accuracy. Pushes the model toward the positive prompt and away from the negative prompt.

When to increase: When visual style or object fidelity matters most
Effect: Stronger prompt following, more accurate semantic content
Configurable per modality: Yes

2. Spatio-Temporal Guidance (stg > 0)

Reduces artifacts by pushing the model away from a degraded, perturbed version of itself. Prevents breakup of rigid objects. Based on the STG technique.

When to increase: If you see structural artifacts or object breakup
Effect: Fewer visual artifacts, more stable structures
Configurable per modality: Yes

3. Cross-Modal Guidance (modality_scale > 1)

Controls synchronization between audio and video. Pushes the model away from versions where modalities ignore each other.

When to adjust: To balance synchronization versus natural motion
Higher values: Tighter alignment (perfect for lip-sync or rhythmic action)
Lower values: Looser, more natural coupling
Configurable per modality: Yes

Additional Per-Modality Parameters

skip_step - Periodically skip diffusion steps for this modality
- 0: No skipping
- 1: Skip every other step
- 2: Skip two out of every three steps
- Use for performance optimization
rescale - Normalization after applying CFG, STG, and cross-modal guidance
- 0: No normalization
- 1: Full renormalization to match the norm of the positive-prompt prediction
- 0-1: Partial normalization
- Especially helpful for preventing oversaturation when using high CFG or STG values
perturb_attn - Boolean controlling whether the perturbed model is perturbed for this modality during STG. Normally set to True.
cross_attn - Boolean controlling whether cross-attention layers from this modality to the other modality are active. Normally set to True.

Returns

guider - Configured guider ready for sampling

Use cases

Use Case 1: Prioritize video quality, loose audio sync

Video: High CFG, moderate STG, low modality scale
Audio: Low CFG, low STG, low modality scale
Result: Beautiful video, audio follows general mood but not frame-locked

Use Case 2: Tight lip-sync for dialogue

Video: Moderate CFG, moderate STG, high modality scale
Audio: Moderate CFG, low STG, high modality scale
Result: Audio and video tightly synchronized, good for speaking

Use Case 3: Performance optimization

Video: Process every step
Audio: Skip every other step (skip_step = 1)
Result: 2x faster generation with minimal audio quality impact

Integration with other nodes

Works with all LTX-2 sampler nodes
Can be combined with latent normalization for additional quality control
Essential for looping sampler workflows

LTX Add Video IC-LoRA Guide Advanced

Location: iclora.py

What it does

Why use it

When to use

Use this node when:

You want softer, less rigid IC-LoRA control
You need IC-LoRA to apply only to specific regions of the frame
You want to blend IC-LoRA control with free generation
You’re combining multiple control types and need to balance their influence

Parameters

attention_strength (float, 0.0-1.0) — Global scaling factor for IC-LoRA cross-attention scores. Default: 1.0
attention_mask (MASK, optional) — Spatial (H×W) or spatiotemporal (T×H×W) mask multiplied with attention_strength

Returns

Model with IC-LoRA applied at the specified strength/mask configuration

Quality Enhancement

LTXVNormalizingSampler

Location: easy_samplers.py

What it does

A specialized sampler that applies statistical normalization to latents during generation to prevent overbaking (oversaturation) and audio clipping issues.

Why use it

Without normalization, latent values can drift into problematic ranges during the denoising process. This causes:

Oversaturated, “overbaked” visual outputs with crushed colors
Audio clipping and distortion
Inconsistent quality across different prompts or settings

The NormalizingSampler keeps latent statistics in optimal ranges throughout generation, dramatically improving output quality.

When to use

Use this node when:

You see oversaturated, “overbaked” visual outputs
Audio has clipping or distortion artifacts
Output quality varies unpredictably between generations
Using high guidance values that tend to cause oversaturation

How it works

Key benefits

Prevents oversaturated, “overbaked” visual outputs
Eliminates audio clipping artifacts
More consistent quality across generations
Works automatically - no manual tuning required
Especially effective with high guidance values

Integration

This is a drop-in replacement for standard samplers in LTX-2 workflows. It maintains full compatibility with:

All guider nodes (including MultimodalGuider)
Text and image conditioning
LoRA and IC-LoRA workflows

Performance impact

Minimal - the normalization adds negligible computational overhead while significantly improving output quality.

Audio Identity

This node provides speaker identity conditioning for the LipDub IC-LoRA two-stage dubbing pipeline.

LTXVSetAudioRefTokens

Location: iclora.py

What it does

Why use it

When to use

Used once per stage in the LipDub two-stage pipeline. Stage 1 receives the VAE-encoded reference audio; Stage 2 receives the Stage 1 audio output.

Parameters

conditioning - The text conditioning to attach reference tokens to
audio_latent - The audio latent to use as reference (from VAE encode in Stage 1, or from Stage 1 output in Stage 2)

Returns

conditioning - Conditioning with ref_audio tokens prepended
frozen_audio - Audio latent with zero noise mask for pass-through

Installation & Usage

Installation

All nodes are available in the ComfyUI-LTXVideo repository.

Via ComfyUI Manager (Recommended):

Open ComfyUI Manager
Search for “ComfyUI-LTXVideo”
Click Update (if already installed) or Install
Restart ComfyUI

Manual Update:

$ cd ComfyUI/custom_nodes/ComfyUI-LTXVideo
$ git pull origin master
$ pip install -r requirements.txt

After installation/update, restart ComfyUI. The new nodes will appear under the “Lightricks” category.

Troubleshooting

GemmaAPITextEncode

“Invalid API key” error

Verify your API key is correct
Regenerate a new key at console.ltx.video
Ensure no extra spaces in the API key field

“Model ID cannot be identified” error

Your checkpoint file may be missing metadata
Ensure you’re using an official LTX-2 model
Ensure the ckpt_name field in the node matches the filename of the model loaded in your Checkpoint Loader

Timeout errors

Check your internet connection
The API may be experiencing high load

LTXVSaveConditioning / LTXVLoadConditioning

File not found

Ensure .safetensors extension is not doubled
Check that files are in ComfyUI/models/embeddings/
Ensure you are not mixing up the models/embeddings/ folder with models/text_encoders/

Out of memory when loading

CPU / GPU memory management is key to avoiding OOM errors

MultimodalGuider

No quality improvement vs standard guider

Ensure you have connected two GuiderParameters nodes (one for audio, one for video)
High values can break the generation. Suggested baseline for balanced speed and consistency is Modality: 1 and Skip Step: 1

Generation is slower than expected

NOTE: This node uses CFG > 1, which inherently makes generation slower
Use Skip Step: 1 for increased speed, reduce this value if artifacts appear

LTXVNormalizingSampler

Still seeing overbaking

This is a sampler, not a post-process — ensure you have swapped out the SamplerCustomAdvanced node
Try combining with lower guidance values
Consider if your prompt or conditioning is the root cause

Quality seems worse

Use this node ONLY for the first sampling stage. Revert to a standard sampler for the second/upscale stage
Do not use this sampler for inpainting, video extension, or any workflow using masks. It may break the context audio
Ensure you are using the Distilled model with the standard 8-step manual sigma schedule. This node is NOT tuned for the full model
Normalization helps most with problematic outputs (clipping/saturation). If your generation is already clean, this node may introduce unnecessary noise. Test side-by-side with the same seed

$	cd ComfyUI/custom_nodes/ComfyUI-LTXVideo
$	git pull origin master
$	pip install -r requirements.txt