X Tutup
Skip to content

hyc001/MultiAnimate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu1,2,3*    Haowen Gong3    Chuanguang Yang1    Zhulin An1,†    Yongjun Xu1    Songhua Liu3,†
1State Key Laboratory of AI Safety, Institute of Computing Technology, CAS
2ShanghaiTech University
3Shanghai Jiao Tong University
Corresponding Authors

arXiv Project Page Hugging Face

📖 Introduction

We present MultiAnimate for multi-character image animation, which is the first extensible framework built upon modern DiT-based video generators, to the best of our knowledge.

📰 News

  • [Feb 2026] 🎉 MultiAnimate has been accepted by CVPR 2026!

🎥 Demos

Our framework, trained only on two-character data, is capable of producing identity-consistent three-person videos and can, in principle, be extended to scenarios with even more participants.

demo3.mp4
Figure 1: Three-character animation.
demo4.mp4
Figure 2: Four-character animation.

⚙️ Environment Setup

We recommend using Anaconda to manage your environment.

conda create -n multianimate python=3.10.16
conda activate multianimate

# CUDA 12.1
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121

git clone https://github.com/hyc001/MultiAnimate.git
cd MultiAnimate
# DiffSynth-Studio from source code
pip install -e .

pip install -r requirements.txt

🚀 Quick Demo

Before running the demos, please ensure you have the Hugging Face CLI installed:

pip install -U "huggingface_hub[cli]"

1. Download Base Model

MultiAnimate is built upon the Wan2.1 framework. First, download the base Wan2.1-I2V-14B-720P model into your checkpoints directory:

huggingface-cli download Wan-AI/Wan2.1-I2V-14B-720P --local-dir checkpoints/Wan2.1-I2V-14B-720P

2. Download Demo Dataset

We provide pre-annotated data (including reference images and poses) for 3-person and 4-person generation scenarios. Download the demo folder from our repository:

huggingface-cli download N00B0DY/MultiAnimate --include "demo/*" --local-dir .

3. Demo 1: Standard Model (Up to 3 Characters)

Download our standard checkpoint optimized for complex human interactions, and run the inference script:

# Download MultiAnimate weights
huggingface-cli download N00B0DY/MultiAnimate --include "epoch=39-step=7000.ckpt/*" --local-dir checkpoints/

# Run inference
CUDA_VISIBLE_DEVICES="0" python examples/inference_multi3.py

4. Demo 2: Extended Model (Up to 7 Characters)

We provide the extended model. Here, we use a 4-person scenario as an example:

# Download extended MultiAnimate weights
huggingface-cli download N00B0DY/MultiAnimate --include "epoch=23-step=4200-multi.ckpt/*" --local-dir checkpoints/

# Run inference for 4 characters
CUDA_VISIBLE_DEVICES="0" python examples/inference_multi4.py

🏃‍♂️ Training & Data Preparation

1. Data Processing

To train or fine-tune the model, you need to extract human poses and character masks from your video data. We recommend the following pipeline, though you are completely free to use other equivalent methods:

  • Pose Extraction: We utilize DWPose to extract human skeletal keypoints.
  • Mask Extraction: We use Sa2VA-8B to segment the characters. A typical prompt used for extraction looks like: "<image>Please segment the left person".

After processing, your processed_data directory should be organized as follows:

processed_data/
├── video1/
│   ├── frames.pkl
│   ├── mask_female.pkl
│   ├── mask_male.pkl
│   └── pose.pkl
├── video2/
│   ├── frames.pkl
│   ├── ...
└── ...

2. Training

💡 Memory Saving Tip: Based on our experience, if you are training on an A100 GPU (or GPUs with similar VRAM constraints), we highly recommend using prompt feature caching to save memory. Please refer to examples/prompt_emb.py for feature extraction and examples/train_multi.py for the core training logic.

Once your data and environment are ready, you can start the training process by running:

sh train.sh

🙏 Acknowledgements

Our codebase is built upon the wonderful UniAnimate-DiT. We sincerely thank the authors for their fantastic open-source contribution to the community!

🗂️ Dataset: Gen-Dataset for Scene Generalization

We release the Gen-dataset introduced in our paper. You can optionally incorporate this dataset into your training phase for scene generalization.

You can easily download the zipped dataset from our Hugging Face repository and extract it:

# Download the dataset zip file to the data/ folder
huggingface-cli download N00B0DY/MultiAnimate Gen-dataset.zip --local-dir data/

# Unzip the dataset
cd data/
unzip Gen-dataset.zip

📝 BibTeX

If you find our work helpful, please consider citing:

@article{hu2026multianimateposeguidedimageanimation,
      title={MultiAnimate: Pose-Guided Image Animation Made Extensible}, 
      author={Yingcheng Hu and Haowen Gong and Chuanguang Yang and Zhulin An and Yongjun Xu and Songhua Liu},
      year={2026},
      eprint={2602.21581},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.21581}, 
}

About

[CVPR 2026] Official implementation of MultiAnimate: Pose-Guided Image Animation Made Extensible

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

X Tutup