Inpainting and Outpainting

This guide walks you through workflows for Inpainting and Outpianting: a sample workflow for ComfyUI from LTX that extends or fills regions of existing video using IC-LoRA conditioning. Outpainting generates new content beyond the original frame boundaries (making a video wider or taller), while inpainting fills masked regions within the frame (removing or replacing objects). Both use the same IC-LoRA checkpoint with different mask configurations.

The workflow uses mask-aware preprocessing and Laplacian pyramid blending at both generation stages to produce seamless transitions between original and generated content.

Prerequisites

This guide assumes you’re familiar with ComfyUI basics and IC-LoRA workflows. If you’re new to IC-LoRAs, start with the IC-LoRA Guide.

Model Files

The workflow requires the standard LTX-2.3 model stack plus the inpainting IC-LoRA:

FileDescriptionPlacement
ltx-2.3-22b-dev.safetensorsFull-precision model checkpointComfyUI/models/checkpoints/
ltx-2.3-22b-distilled-lora-384-1.1.safetensorsDistilled LoRA (v1.1)ComfyUI/models/loras/
ltx-2.3-22b-ic-lora-in-outpainting-0.9.safetensorsInpainting/outpainting IC-LoRAComfyUI/models/loras/
comfy_gemma_3_12B_it.safetensorsText encoder (Gemma 3 12B)ComfyUI/models/text_encoders/

All LTX files are available in the LTX-2.3 HuggingFace collection.

The same IC-LoRA checkpoint is used for both inpainting and outpainting. The difference is how the mask is configured.

How It Works

The workflow loads a source video, creates a mask that defines which regions should be generated, and runs a two-stage pipeline where each stage applies mask-aware preprocessing and blends the generated result with the original content:

  1. Setup — The source video is loaded and padded to a target canvas size (for outpainting) or masked (for inpainting). The padded/masked area defines where new content will be generated.
  2. Stage 1 — Generates video and audio at the base resolution using the IC-LoRA conditioning. The LTXVInpaintPreprocess node prepares the masked input, and LTXVLaplacianPyramidBlend merges the generated output with the original content.
  3. Upscale — The spatial upscaler doubles the video resolution.
  4. Stage 2 — Refines the upscaled video at full resolution, again using LTXVInpaintPreprocess and LTXVLaplacianPyramidBlend to maintain clean mask boundaries.

This workflow uses LTXAddVideoICLoRAGuideAdvanced instead of the standard LTXAddVideoICLoRAGuide used in other IC-LoRA workflows. The advanced version supports mask-aware conditioning required for inpainting and outpainting.

Outpainting

Outpainting extends the video canvas beyond its original boundaries, generating new content in the padded region while preserving the original footage in the center.

Step-by-Step

1. Download and Load the Workflow

[Download the Outpainting workflow][https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3] from our GitHub repository and drag it into ComfyUI to load it.

2. Install Custom Nodes and Download Models

This workflow requires the ComfyUI-LTXVideo custom node package. Open the Workflow Overview panel (right sidebar) to check for missing nodes or model files.

3. Load Your Source Video

Find the LoadVideo node and select your source video. The workflow extracts the video frames and passes them to the outpainting setup.

4. Set the Target Canvas Size

Find the ImagePadForOutpaintTargetSize node and set target_width and target_height to your desired output dimensions. The source video is centered in the canvas, and the surrounding area becomes the mask — the region the model will generate.

The default target is 1920×1088.

5. Write Your Prompt (Optional)

Find the CLIP Text Encode (Positive Prompt) node. Prompting behavior for outpainting:

  • No prompt — The model extends the canvas naturally, inferring content from the visible context. This works well when the extension is a straightforward continuation of the scene (sky, ground, background).
  • With a prompt — Describe the full scene including both the original content and what you want generated in the extended area. Do not describe only the new region; the model needs the full context to maintain coherence.

6. Generate

Click Run to start generation. The pipeline runs the two-stage process described in How It Works.

7. Review and Iterate

If the result shows issues at the boundary between original and generated content, the most important parameter to adjust is dilation in the LTXVLaplacianPyramidBlend nodes. See Customization below.

Inpainting

Inpainting fills masked regions within the video frame (removing objects, replacing content, or repairing damaged areas) while preserving everything outside the mask. It uses the same IC-LoRA and two-stage pipeline as outpainting, with a user-supplied mask instead of auto-generated padding.

Step-by-Step

1. Download and Load the Workflow

[Download the Inpainting workflow][https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/example_workflows/2.3] from our GitHub repository and drag it into ComfyUI to load it.

2. Install Custom Nodes and Download Models

This workflow requires the ComfyUI-LTXVideo custom node package. Open the Workflow Overview panel (right sidebar) to check for missing nodes or model files. The model files are the same as outpainting — see Model Files above.

3. Load Your Source Video

Find the LoadVideo node and select the video you want to inpaint. The workflow extracts the video frames and audio.

4. Load or Create Your Mask

The workflow expects a B&W mask video (or image) where white = inpaint and black = keep. The mask must match the source video’s frame count and aspect ratio. The workflow converts the mask through the red channel via ImageToMask.

You can create masks using segmentation models (e.g., SAM), manual painting in an image editor, or bounding-box masks. For object replacement, a loose mask often works better than a tight silhouette — tight masks leave little room for the model to generate new content.

For replacement, size the mask for the new object, not the old one. A car-sized mask won’t give the model enough room to generate a truck. Increase segmentation dilation or use a bounding-box mask instead.

5. Write Your Prompt (Optional)

Find the CLIP Text Encode (Positive Prompt) node. Prompting behavior for inpainting:

  • No prompt — The model almost always removes the masked object, filling the area with background content inferred from the surrounding context.
  • With a prompt — Describe the full scene as you want it to look, not the edit itself. The model generates with the prompt as context for the entire frame, then blends the masked region.
    • Wrong: “replace the car with a horse”
    • Right: “a horse walking down an empty country road, sunny afternoon, cinematic”

6. (Optional) Use I2V Mode for Replacement

For object replacement, you can get better results by generating the first frame separately (in Photoshop, Flux, Kontext, or another image tool) with the replacement already composited, then running inpainting in I2V mode using that frame as the starting point.

The workflow includes a bypass_i2v toggle (a PrimitiveBoolean node), set to True by default (I2V bypassed). Set it to False to enable I2V conditioning and provide your composited first frame via the LoadImage node.

7. Generate

Click Run to start generation. The pipeline runs the two-stage process described in How It Works, with LTXVDilateVideoMask expanding the mask before each stage.

8. Review and Iterate

If the result has issues at mask boundaries, the most important parameters to adjust are mask dilation (LTXVDilateVideoMask) and blend dilation (LTXVLaplacianPyramidBlend). See Customization below.

Key Nodes

This workflow introduces several nodes not used in other IC-LoRA workflows:

LTXAddVideoICLoRAGuideAdvanced

The advanced version of LTXAddVideoICLoRAGuide. Supports mask-aware conditioning for inpainting and outpainting, passing mask information through the IC-LoRA guide pipeline so the model knows which regions to generate and which to preserve.

This node exposes additional widget parameters beyond attention_strength and attention_mask documented in the node reference. For In-Outpainting, use the workflow defaults — the advanced parameters are pre-configured for mask-aware generation.

ImagePadForOutpaintTargetSize

Pads the source video to a target canvas size, centering the original content and creating the outpainting mask from the padded region.

Key parameters:

  • target_width / target_height — The desired output dimensions. The original video is centered inside the canvas; the surrounding pad becomes the generation mask. For example, a 768×768 input with target_width: 1280 and target_height: 768 extends the video horizontally.

LTXVInpaintPreprocess

Prepares the masked video input for each generation stage. Used twice in the workflow — once at base resolution (stage 1) and once at upscaled resolution (stage 2).

This node has no configurable parameters — it takes two inputs (images and mask) and outputs the preprocessed result for the sampler.

LTXVDilateVideoMask (Inpainting Only)

Expands the inpainting mask spatially (and optionally temporally) before processing. Used twice in the inpainting workflow — once per generation stage. Dilating the mask gives the model room to generate content that blends naturally beyond the exact mask boundary.

This node is not used in the outpainting workflow, where the mask is defined by the padding region.

Key parameters:

  • spatial_radius — How many pixels to expand the mask in each direction. Higher values give the model more room for generation at mask edges.
  • temporal_radius — How many frames to expand the mask temporally. Set to 0 in the default workflow (no temporal expansion).

LTXVLaplacianPyramidBlend

Blends the generated output with the original content using Laplacian pyramid blending, producing seamless transitions at mask boundaries. Used twice — once per stage.

Key parameters:

  • dilation (mask_low_res_dilation) — Controls how far the blending extends beyond the mask edge. This is the most important parameter to tune for boundary quality. Higher values extend the blending region further — increase dilation when the blend area contains low-frequency content (smooth gradients, sky, etc.). Defaults: 5 / 6 (stage 1 / stage 2) for outpainting, 6 / 6 for inpainting.

Customization

Dilation

Two types of dilation affect output quality at mask boundaries:

Mask dilation (LTXVDilateVideoMask, inpainting only) — Expands the mask itself before processing. This controls how much room the model has to generate beyond the original mask edge. Increase this if the model is struggling to produce clean edges or if the mask is too tight around the target object.

Blend dilation (mask_low_res_dilation in LTXVLaplacianPyramidBlend) — Controls how far the Laplacian pyramid blending extends beyond the mask edge when merging generated content with the original video. This is the most impactful setting for seam quality. Outpainting defaults: 5 (stage 1) and 6 (stage 2). Inpainting defaults: 6 for both stages. Higher values help when the boundary area contains low-frequency content like smooth gradients or sky.

Prompting

For both inpainting and outpainting, describe the full scene rather than just the region being generated. The model uses the prompt as context for the entire frame.

For outpainting, prompts are optional — the model can extend the canvas naturally from the visible context. Prompts help when you want to guide what appears in the extended area (e.g., adding a specific landscape beyond the original frame edge).

For inpainting, leaving the prompt empty almost always removes the masked object. When replacing content, write a standard scene description of the desired result, not an editing instruction.

CFG

Both stages use CFG 1. The distilled model was trained to produce good results at this value. Raising CFG adds computational overhead and can cause oversaturation. If experimenting, stay in the 1.0–1.5 range.

Tips & Troubleshooting

  • Prompts describe the scene, not the edit — Write a regular T2V-style prompt describing the in-painting region. Do not treat this like an an editing tool.
  • Boundary seam visible — Adjust the dilation parameter in the LTXVLaplacianPyramidBlend nodes. See Customization for details.
  • Green artifacts at mask edges — The pipeline composites green under the mask before diffusion. Traces can sometimes leak through at mask boundaries. Try increasing mask dilation (LTXVDilateVideoMask), blend dilation (LTXVLaplacianPyramidBlend), using a different seed, or ensuring the mask is encoded losslessly.
  • Inpainted object doesn’t match the scene — Make sure your prompt describes the full scene, not the edit (“a horse on a country road,” not “replace the car with a horse”). For complex replacements, use I2V mode: composite the replacement into the first frame externally, then run inpainting with bypass_i2v set to False.
  • Tight mask leaves artifacts — If the model struggles at object edges, the mask is probably too tight. Increase spatial_radius in the LTXVDilateVideoMask nodes, or use a looser mask.

Technical Notes

  • The latent_downscale_factor output from the IC-LoRA loader is intentionally not used in this workflow. The inpainting IC-LoRA was trained with reference_downscale_factor = 1, so the reference is processed at the same resolution as the output.
  • Stage 1 uses the euler sampler (not euler_ancestral_cfg_pp used in most other IC-LoRA workflows) and reads the denoised_output rather than the standard output from the sampler.
  • The workflow passes through the original video’s audio via GetVideoComponents rather than generating new audio.