IC-LoRA (Image Conditioning)

IC-LoRA

IC-LoRA (In-Context LoRA) enables precise video-to-video control in LTX-2 by conditioning generation on reference signals like depth maps, pose skeletons, or edge detections. Unlike standard LoRAs that modify style or effects, IC-LoRAs let you dictate the spatial structure and motion of your output with frame-level precision.

This enables workflows where you can animate depth maps into videos, retarget character motion using pose sequences, or generate videos following precise edge-based compositions.

IC-LoRA Video Walkthrough

Comparison: IC-LoRA vs LoRA

FeatureLoRAIC-LoRA
PurposeStyle, effects, visual modificationsStructural control, motion guidance
InputText prompt onlyText prompt + control signal
Strength0.5-1.5 adjustableAlways 1.0
ControlGlobal style influenceFrame-level spatial control
Training DataVideo datasets (single modality)Paired video + control signals
Use Case”Make it look like cake""Follow this depth map”
CombiningEasily stack multipleBest used one control type at a time

When to use which:

  • Need style/effect: Standard LoRA
  • Need structural control: IC-LoRA
  • Need both: Combine them (IC-LoRA 1.0 + effect LoRA 0.5-0.8)

Available IC-LoRA Models

LTX-2 provides several official IC-LoRA control adapters, each trained for specific control types:

Depth Control

Control video generation using depth map sequences that represent spatial relationships and 3D structure.

Model: LTX-2-19b-IC-LoRA-Depth-Control

Use Cases:

  • Architectural walkthroughs with camera movement following depth
  • Converting 3D scene depth renders into photorealistic video
  • Maintaining spatial consistency across generated frames
  • Depth-aware compositing and scene reconstruction

Control Input: Grayscale depth maps where pixel intensity represents distance (closer = darker/lighter depending on convention)

Pose Control

Guide character and human motion using skeletal pose sequences extracted from reference videos or animation.

Model: LTX-2-19b-IC-LoRA-Pose-Control

Use Cases:

  • Character animation and motion retargeting
  • Dance sequence generation from pose data
  • Consistent human motion across style transfers
  • Precise body positioning and movement control

Control Input: Pose keypoint data typically from OpenPose, MediaPipe, or similar skeleton detection systems

Canny Control

Control composition and structure using Canny edge detection, providing precise outlines and boundaries.

Model: LTX-2-19b-IC-LoRA-Canny-Control

Use Cases:

  • Line art to video conversion
  • Architectural rendering with precise edge control
  • Logo and graphic animation
  • Maintaining strict compositional structure

Control Input: Canny edge detection maps (white edges on black background)

Video Detailer

Enhance detail, quality, and resolution in video generation workflows, particularly useful for upscaling pipelines.

Model: LTX-2-19b-IC-LoRA-Detailer

Use Cases:

  • Multi-scale rendering workflows
  • Video upscaling with detail enhancement
  • Quality improvement in generation pipelines
  • Detail recovery in compressed or lower-quality sources

Control Input: Reference video frames for structural guidance

Using IC-LoRAs

In ComfyUI

ComfyUI provides the recommended workflow for IC-LoRA due to its visual node-based interface for styling a video.

Setup:

  1. Install ComfyUI-LTXVideo custom nodes
  2. Download IC-LoRA model from Hugging Face (choose ComfyUI-compatible .safetensors)
  3. Place in ComfyUI/models/loras/ directory
  4. Load the ic-lora.json workflow

Basic Workflow:

Control Signal Input → IC-LoRA Loader → LTX Model → Video Output
↓ ↓
Text Prompt ─────────────────────────────┘

Key nodes:

  • Load Image/Video: Load your control sequence
  • Apply IC-LoRA: Applies the control adapter to the model
  • LTX Sampler: Generates video following the control signal
  • Text Encoder: Provides prompt conditioning

Example workflow steps:

  1. Load depth map or pose skeleton video
  2. Apply IC-LoRA with strength 1.0 (recommended)
  3. Provide descriptive text prompt
  4. Generate video with standard LTX-2 parameters

Preparing Control Signals

The quality of your controls directly impacts IC-LoRA results. Here’s how to prepare each type:

Depth Maps

Tools:

  • Depthcrafter
  • Blender/3D software - For synthetic depth renders

Best practices:

  • Use consistent depth range across all frames
  • Ensure smooth temporal transitions (avoid flickering)
  • Match resolution to target generation resolution

Example depth extraction:

1from transformers import pipeline
2
3depth_estimator = pipeline("depth-estimation", model="depth-anything/Depth-Anything-V2-Large")
4
5# Process video frame-by-frame
6for frame in video_frames:
7 depth = depth_estimator(frame)["depth"]
8 # Normalize and save
9 depth_map = (depth / depth.max() * 255).astype(np.uint8)

Pose Skeletons

Tools:

Best practices:

  • Extract poses at consistent frame rate matching target generation
  • Ensure pose keypoints are detected reliably across all frames
  • Handle occlusions gracefully (interpolate missing keypoints)
  • Use skeleton visualization format (lines connecting keypoints)

Format: Typically 17-18 keypoints for body skeleton rendered as visual overlay

Canny Edges

Tools:

  • OpenCV Canny edge detection
  • PIL/Pillow image processing
  • ComfyUI Canny preprocessor nodes

Best practices:

  • Adjust threshold values to capture essential edges without noise
  • Maintain consistent edge thickness across frames
  • Blur input slightly before edge detection to reduce noise

Example edge extraction:

1import cv2
2
3def extract_canny(frame, low_threshold=100, high_threshold=200):
4 # Convert to grayscale
5 gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
6 # Optional: slight blur to reduce noise
7 blurred = cv2.GaussianBlur(gray, (5, 5), 0)
8 # Detect edges
9 edges = cv2.Canny(blurred, low_threshold, high_threshold)
10 return edges

IC-LoRA Parameters

Control Strength

Unlike standard LoRAs where strength typically ranges from 0.5-1.0, IC-LoRAs are designed to work at full strength (1.0).

Why always 1.0?

  • IC-LoRAs are trained specifically to condition on control signals
  • Lower strengths result in ignored or inconsistent control
  • The prompt and control signal balance is built into the training

Exception: The Video Detailer IC-LoRA may benefit from strength adjustment (0.7-1.0) depending on desired detail level.

Resolution and Frame Rate

  • Match control signal resolution to generation resolution
  • Control FPS should match target generation FPS
  • For best results: 704x1216 at 24-30 FPS
  • IC-LoRAs work at various resolutions but quality depends on control signal clarity

Training Custom IC-LoRAs

Create your own IC-LoRA control adapters using the LTX-Video-Trainer.

Best Practices

Control Signal Quality

  • Use high-quality control extraction tools
  • Ensure temporal consistency (smooth transitions between frames)
  • Match control resolution to generation resolution
  • Pre-process control signals to remove noise and artifacts

Prompt Alignment

  • Describe visual style, not control type (“ornate architecture” not “depth map shows…”)
  • Align prompt with control signal motion and composition
  • Be specific about materials, lighting, and atmosphere
  • Avoid contradicting the control structure

Performance Optimization

  • IC-LoRAs add minimal overhead (less than 10% compute)
  • Works with FP8 quantized models
  • Compatible with distilled models for faster generation
  • Use Video Detailer IC-LoRA for efficient upscaling

Quality Validation

  • Always test IC-LoRA with simple control signals first
  • Verify control is being respected before complex generations
  • Compare with and without IC-LoRA to assess control strength
  • Iterate on control signal quality before increasing generation complexity

IC-LoRA Troubleshooting

Control Signal Not Being Followed

Symptoms: Generated video ignores control structure

Solutions:

  • Verify IC-LoRA is loaded correctly (check adapter name)
  • Ensure strength is set to 1.0 (required for IC-LoRAs)
  • Check control signal format matches expected input (resolution, channels)
  • Validate control signal has sufficient contrast/detail
  • Verify model compatibility (IC-LoRA version matches base model)

Temporal Flickering or Inconsistency

Symptoms: Unstable motion, frame-to-frame inconsistencies

Solutions:

  • Smooth control signal temporal transitions (use interpolation)
  • Increase inference steps (try 40-50 instead of 30)
  • Ensure control signal FPS matches generation FPS
  • Apply temporal filtering to control signal before generation
  • Check for abrupt changes in control signal values

Poor Quality or Artifacts

Symptoms: Visual artifacts, degraded quality, unwanted textures

Solutions:

  • Improve control signal quality (better extraction tools)
  • Ensure control signal resolution is adequate
  • Adjust prompt to better describe desired style
  • Try Video Detailer IC-LoRA for quality enhancement
  • Check that control signal doesn’t have noise or compression artifacts

Control Too Strong or Rigid

Symptoms: Output looks constrained, lacks natural variation

Solutions:

  • IC-LoRAs are designed for 1.0 strength - this is expected behavior
  • Adjust your control signal to be less restrictive (e.g., lighter edges)
  • Use more flexible prompts that allow style variation
  • Consider if standard LoRA might be better for your use case

Memory Issues

Symptoms: Out of memory errors during generation

Solutions:

  • Use FP8 quantized models to reduce VRAM
  • Reduce generation resolution
  • Process control signals in smaller batches

Resources

Official Models

Training Resources