GitHub - hyc001/MultiAnimate: [CVPR 2026] Official implementation of MultiAnimate: Pose-Guided Image Animation Made Extensible

MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu^1,2,3* Haowen Gong³ Chuanguang Yang¹ Zhulin An^1,† Yongjun Xu¹ Songhua Liu^3,†

¹State Key Laboratory of AI Safety, Institute of Computing Technology, CAS
²ShanghaiTech University
³Shanghai Jiao Tong University
^†Corresponding Authors

📖 Introduction

We present MultiAnimate for multi-character image animation, which is the first extensible framework built upon modern DiT-based video generators, to the best of our knowledge.

📰 News

[Feb 2026] 🎉 MultiAnimate has been accepted by CVPR 2026!

🎥 Demos

Our framework, trained only on two-character data, is capable of producing identity-consistent three-person videos and can, in principle, be extended to scenarios with even more participants.

demo3.mp4

Figure 1: Three-character animation.

demo4.mp4

Figure 2: Four-character animation.

⚙️ Environment Setup

We recommend using Anaconda to manage your environment.

conda create -n multianimate python=3.10.16
conda activate multianimate

# CUDA 12.1
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121

git clone https://github.com/hyc001/MultiAnimate.git
cd MultiAnimate
# DiffSynth-Studio from source code
pip install -e .

pip install -r requirements.txt

🚀 Quick Demo

Before running the demos, please ensure you have the Hugging Face CLI installed:

pip install -U "huggingface_hub[cli]"

1. Download Base Model

MultiAnimate is built upon the Wan2.1 framework. First, download the base Wan2.1-I2V-14B-720P model into your checkpoints directory:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir checkpoints/Wan2.1-I2V-14B-720P

2. Download Demo Dataset

We provide pre-annotated data (including reference images and poses) for 3-person and 4-person generation scenarios. Download the demo folder from our repository:

huggingface-cli download N00B0DY/MultiAnimate --include "demo/*" --local-dir .

3. Demo 1: Standard Model (Up to 3 Characters)

Download our standard checkpoint optimized for complex human interactions, and run the inference script:

# Download MultiAnimate weights
huggingface-cli download N00B0DY/MultiAnimate --include "epoch=39-step=7000.ckpt/*" --local-dir checkpoints/

# Run inference
CUDA_VISIBLE_DEVICES="0" python examples/inference_multi3.py

4. Demo 2: Extended Model (Up to 7 Characters)

We provide the extended model. Here, we use a 4-person scenario as an example:

# Download extended MultiAnimate weights
huggingface-cli download N00B0DY/MultiAnimate --include "epoch=23-step=4200-multi.ckpt/*" --local-dir checkpoints/

# Run inference for 4 characters
CUDA_VISIBLE_DEVICES="0" python examples/inference_multi4.py

🏃‍♂️ Training & Data Preparation

1. Data Processing

To train or fine-tune the model, you need to extract human poses and character masks from your video data. We recommend the following pipeline, though you are completely free to use other equivalent methods:

Pose Extraction: We utilize DWPose to extract human skeletal keypoints.
Mask Extraction: We use Sa2VA-8B to segment the characters. A typical prompt used for extraction looks like: "<image>Please segment the left person".

After processing, your processed_data directory should be organized as follows:

processed_data/
├── video1/
│   ├── frames.pkl
│   ├── mask_female.pkl
│   ├── mask_male.pkl
│   └── pose.pkl
├── video2/
│   ├── frames.pkl
│   ├── ...
└── ...

2. Training

💡 Memory Saving Tip: Based on our experience, if you are training on an A100 GPU (or GPUs with similar VRAM constraints), we highly recommend using prompt feature caching to save memory. Please refer to examples/prompt_emb.py for feature extraction and examples/train_multi.py for the core training logic.

Once your data and environment are ready, you can start the training process by running:

sh train.sh

🙏 Acknowledgements

Our codebase is built upon the wonderful UniAnimate-DiT. We sincerely thank the authors for their fantastic open-source contribution to the community!

🗂️ Dataset: Gen-Dataset for Scene Generalization

We release the Gen-dataset introduced in our paper. You can optionally incorporate this dataset into your training phase for scene generalization.

You can easily download the zipped dataset from our Hugging Face repository and extract it:

# Download the dataset zip file to the data/ folder
huggingface-cli download N00B0DY/MultiAnimate Gen-dataset.zip --local-dir data/

# Unzip the dataset
cd data/
unzip Gen-dataset.zip

📝 BibTeX

If you find our work helpful, please consider citing:

@article{hu2026multianimateposeguidedimageanimation,
      title={MultiAnimate: Pose-Guided Image Animation Made Extensible}, 
      author={Yingcheng Hu and Haowen Gong and Chuanguang Yang and Zhulin An and Yongjun Xu and Songhua Liu},
      year={2026},
      eprint={2602.21581},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.21581}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
diffsynth		diffsynth
examples		examples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiAnimate: Pose-Guided Image Animation Made Extensible

📖 Introduction

📰 News

🎥 Demos

⚙️ Environment Setup

🚀 Quick Demo

1. Download Base Model

2. Download Demo Dataset

3. Demo 1: Standard Model (Up to 3 Characters)

4. Demo 2: Extended Model (Up to 7 Characters)

🏃‍♂️ Training & Data Preparation

1. Data Processing

2. Training

🙏 Acknowledgements

🗂️ Dataset: Gen-Dataset for Scene Generalization

📝 BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MultiAnimate: Pose-Guided Image Animation Made Extensible

📖 Introduction

📰 News

🎥 Demos

⚙️ Environment Setup

🚀 Quick Demo

1. Download Base Model

2. Download Demo Dataset

3. Demo 1: Standard Model (Up to 3 Characters)

4. Demo 2: Extended Model (Up to 7 Characters)

🏃‍♂️ Training & Data Preparation

1. Data Processing

2. Training

🙏 Acknowledgements

🗂️ Dataset: Gen-Dataset for Scene Generalization

📝 BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages