[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by dg845 · Pull Request #12526 · huggingface/diffusers

dg845 · 2025-10-21T23:02:11Z

What does this PR do?

This PR is a continuation of #12442 by @tolgacangoz. It adds a pipeline for the Wan2.2-Animate-14B model (project page, paper, code, weights), a SOTA character animation and replacement video model.

Fixes #12441 (the original requesting issue).

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
@sayakpaul
@tolgacangoz

- Introduced WanAnimateTransformer3DModel and WanAnimatePipeline. - Updated get_transformer_config to handle the new model type. - Modified convert_transformer to instantiate the correct transformer based on model type. - Adjusted main execution logic to accommodate the new Animate model type.

…l guidance

…prove error handling for undefined parameters

…work for character animation and replacement - Added Wan 2.2 Animate 14B model to the documentation. - Introduced the Wan-Animate framework, detailing its capabilities for character animation and replacement. - Included example usage for the WanAnimatePipeline with preprocessing steps and guidance on input requirements.

- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.

- Introduced `EncoderApp`, `Encoder`, `Direction`, `Synthesis`, and `Generator` classes for enhanced motion and appearance encoding. - Added `FaceEncoder`, `FaceBlock`, and `FaceAdapter` classes to integrate facial motion processing. - Updated `WanTimeTextImageMotionEmbedding` to utilize the new `Generator` for motion embedding. - Enhanced `WanAnimateTransformer3DModel` with additional face adapter and pose patch embedding for improved model functionality.

- Introduced `pad_video` method to handle padding of video frames to a target length. - Updated video processing logic to utilize the new padding method for `pose_video`, `face_video`, and conditionally for `background_video` and `mask_video`. - Ensured compatibility with existing preprocessing steps for video inputs.

…roved video processing - Added optional parameters: `conditioning_pixel_values`, `refer_pixel_values`, `refer_t_pixel_values`, `bg_pixel_values`, and `mask_pixel_values` to the `prepare_latents` method. - Updated the logic in the denoising loop to accommodate the new parameters, enhancing the flexibility and functionality of the pipeline.

…eneration - Updated the calculation of `num_latent_frames` and adjusted the shape of latent tensors to accommodate changes in frame processing. - Enhanced the `get_i2v_mask` method for better mask generation, ensuring compatibility with new tensor shapes. - Improved handling of pixel values and device management for better performance and clarity in the video processing pipeline.

…and mask generation - Consolidated the handling of `pose_latents_no_ref` to improve clarity and efficiency in latent tensor calculations. - Updated the `get_i2v_mask` method to accept batch size and adjusted tensor shapes accordingly for better compatibility. - Enhanced the logic for mask pixel values in the replacement mode, ensuring consistent processing across different scenarios.

…nced processing - Introduced custom QR decomposition and fused leaky ReLU functions for improved tensor operations. - Implemented upsampling and downsampling functions with native support for better performance. - Added new classes: `FusedLeakyReLU`, `Blur`, `ScaledLeakyReLU`, `EqualConv2d`, `EqualLinear`, and `RMSNorm` for advanced neural network layers. - Refactored `EncoderApp`, `Generator`, and `FaceBlock` classes to integrate new functionalities and improve modularity. - Updated attention mechanism to utilize `dispatch_attention_fn` for enhanced flexibility in processing.

…annotations - Removed extra-abstractioned-functions such as `custom_qr`, `fused_leaky_relu`, and `make_kernel` to streamline the codebase. - Updated class constructors and method signatures to include type hints for better clarity and type checking. - Refactored the `FusedLeakyReLU`, `Blur`, `EqualConv2d`, and `EqualLinear` classes to enhance readability and maintainability. - Simplified the `Generator` and `Encoder` classes by removing redundant parameters and improving initialization logic.

dg845 · 2025-11-10T07:31:37Z

Here are some diffusers samples starting from preprocessed inputs from the Wan2.2 official repo.

Animation:

wan_animate_video_20_step.mp4

Replacement:

wan_animate_video_replace_20_step.mp4

yiyixuxu

I left some comments.
It was a pleasure to review! really awesome work! @dg845

yiyixuxu · 2025-11-10T23:34:38Z

src/diffusers/image_processor.py

            VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
-        resample (`str`, *optional*, defaults to `lanczos`):
-            Resampling filter to use when resizing the image.
+        resample (`str`, *optional*, defaults to `"lanczos"`):


can we add a new WanVaeImageProcessor(VaeImageProcessor) and put into wan folder, under utils.py file I think?
(we start to see more and more custom preprocess methods, almost everyone does and they don't really get reused across models, I think moving forward let's just do this for all new models)

cc @DN6 here too, let me know what you think

I think the changes which make _resize_and_fill and _resize_and_crop respect self.config.resample should be added to the base VaeImageProcessor class; this could also be spun off into its own PR. I agree with moving the other (Wan Animate-specific logic) into its own class.

sounds good

src/diffusers/models/transformers/transformer_wan_animate.py

yiyixuxu · 2025-11-11T00:03:19Z

src/diffusers/models/transformers/transformer_wan_animate.py

+    def __repr__(self):
+        return (
+            f"{self.__class__.__name__}(in_features={self.weight.shape[1]}, out_features={self.weight.shape[0]},"
+            f" bias={self.bias is not None})"
+        )


Suggested change

def __repr__(self):

return (

f"{self.__class__.__name__}(in_features={self.weight.shape[1]}, out_features={self.weight.shape[0]},"

f" bias={self.bias is not None})"

)

yiyixuxu · 2025-11-11T00:14:38Z

src/diffusers/models/transformers/transformer_wan_animate.py

+        hidden_states = hidden_states.flatten(2).transpose(1, 2)
+
+        # 3. Condition embeddings (time, text, image)
+        # timestep shape: batch_size, or batch_size, seq_len (wan 2.2 ti2v)


I think we can move one of these conditions for animate, no?

Yeah, Wan Animate is based on Wan 2.1, so the Wan2.2 TI2V logic isn't necessary here, and I have removed it.

yiyixuxu · 2025-11-11T00:28:22Z

src/diffusers/models/transformers/transformer_wan_animate.py

+
+        self.gradient_checkpointing = False
+
+    def motion_batch_encode(


can we move this to forward? all the layers (motion_encoder here) should be visible in forward

src/diffusers/models/transformers/transformer_wan_animate.py

yiyixuxu · 2025-11-11T00:52:16Z

src/diffusers/models/transformers/transformer_wan_animate.py

+
+        hidden_states_original_dtype = hidden_states.dtype
+        hidden_states = self.norm_out(hidden_states.float())
+        # Move the shift and scale tensors to the same device as hidden_states.


ohh let's try to fix it here
I think all we need to do is to pack shift and scale into same layer and add that layere into _no_split_modules attribute

yiyixuxu · 2025-11-11T00:53:25Z

src/diffusers/pipelines/wan/pipeline_wan_animate.py

+        >>> face_video = load_video("path/to/face_video.mp4")
+
+        >>> # Calculate optimal dimensions based on VAE constraints
+        >>> max_area = 480 * 832


if we make a vaeimageprocessor for wan, this can be added there too

src/diffusers/pipelines/wan/pipeline_wan_animate.py

src/diffusers/models/transformers/transformer_wan_animate.py

…otion encoder's upcast_to_fp32 arg

…sor subclass

…e transformer

…ses more memory

…well as vae_scale_factor when calculating default height and width

yiyixuxu

thanks @dg845!
I've asked wan team to review this too, once they tested we can merge :)

src/diffusers/models/transformers/transformer_sana_video.py

docs/source/en/api/pipelines/wan.md

tolgacangoz and others added 30 commits October 6, 2025 21:46

template1

3529a0a

temp2

4f2ee5e

up

778fb54

up

d77b6ba

fix-copies

2fc6ac2

style

6182d44

Refactor WanAnimate model components

8c9fd89

Enhance WanAnimatePipeline with new parameters for mode and tempora…

d01e941

…l guidance

Update WanAnimatePipeline to require additional video inputs and im…

7af953b

…prove error handling for undefined parameters

Add unit test template for WanAnimatePipeline functionality

05a01c6

Add unit tests for WanAnimateTransformer3DModel in GGUF format

22b83ce

- Introduced `WanAnimateGGUFSingleFileTests` to validate functionality. - Added dummy input generation for testing model behavior.

style

7fb6732

Update WanAnimatePipeline

624a314

style

fc0edb5

Refactor test for WanAnimatePipeline to include new input structure

eb7eedd

from einops to torch

8968b42

Merge branch 'main' into integrations/wan2.2-animate

dce83a8

style

802896e

up

84768f6

style

b8337c6

Merge branch 'main' into integrations/wan2.2-animate

4e6651b

dg845 added 5 commits November 8, 2025 04:45

make style and make quality

d9c6bc6

Merge branch 'main' into add-wan2.2-animate-pipeline

4e415d3

Fix first segment I2V mask for prev segement cond latents

cbfc0ad

Use same Open CLIP checkpoint as other Wan2.1-based models

b80be86

Merge branch 'main' into add-wan2.2-animate-pipeline

d87baa5

dg845 marked this pull request as ready for review November 10, 2025 06:26

dg845 requested a review from yiyixuxu November 10, 2025 06:27

yiyixuxu reviewed Nov 11, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_wan_animate.py Outdated Show resolved Hide resolved

dg845 added 16 commits November 11, 2025 07:15

Copy Wan blocks for Wan Animate with # Copied from

6420f0e

Get regional compilation working without recompilation

dd680ee

Remove Wan2.2 TI2V timestep logic as Wan Animate is based on Wan 2.1

6d92b3e

Move motion encoder batch inference logic to forward and remove the m…

d0c7750

…otion encoder's upcast_to_fp32 arg

Move (de)standardize latents logic into Wan Animate pipeline __call__

c2ec703

Move Wan Animate ref image processing logic to its own VaeImageProces…

2f549ee

…sor subclass

make style and make quality

68da86a

Make motion encoder inference batch size configurable from Wan Animat…

cb7977e

…e transformer

Avoid list comprehension for batched motion encoder inference as it u…

847e4a2

…ses more memory

Address more review comments

e4b1db0

Merge branch 'main' into add-wan2.2-animate-pipeline

f0a0d21

make style, make quality, make fix-copies

e96f638

Make motion_encode_batch_size configurable in pipeline __call__

a6ddd02

Merge branch 'main' into add-wan2.2-animate-pipeline

6ad82e5

Update Wan Animate pipeline example

e74373b

Have Wan image processor take into account the spatial patch size as …

2259ded

…well as vae_scale_factor when calculating default height and width

dg845 requested a review from yiyixuxu November 12, 2025 05:47

yiyixuxu approved these changes Nov 12, 2025

View reviewed changes

src/diffusers/models/transformers/transformer_sana_video.py Show resolved Hide resolved

docs/source/en/api/pipelines/wan.md Show resolved Hide resolved

yiyixuxu merged commit d8e4805 into huggingface:main Nov 13, 2025
9 of 11 checks passed

dg845 deleted the add-wan2.2-animate-pipeline branch November 20, 2025 02:06

Conversation

dg845 commented Oct 21, 2025

What does this PR do?

Who can review?

Uh oh!

dg845 commented Nov 10, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants