hub / github.com/Wan-Video/Wan2.2

github.com/Wan-Video/Wan2.2 @main sqlite

678 symbols 2,145 edges 57 files 152 documented · 22%

README

Wan2.2

<img src="https://github.com/Wan-Video/Wan2.2/raw/main/assets/logo.png" width="400"/>







💜 <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp ｜ &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.2">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2503.20314">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp |  &nbsp&nbsp 💬  <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp



📕 <a href="https://alidocs.dingtalk.com/i/nodes/jb9Y4gmKWrx9eo4dCql9LlbYJGXn6lpz">使用指南(中文)</a>&nbsp&nbsp | &nbsp&nbsp 📘 <a href="https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y">User Guide(English)</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat(微信)</a>&nbsp&nbsp

Wan: Open and Advanced Large-Scale Video Generative Models

We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations:

👍 Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.
👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.
👍 Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.
👍 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Video Demos

🔥 Latest News!!

Nov 13, 2025: 👋 Wan2.2-Animate-14B has been integrated into Diffusers (PR,Weights). Thanks to all community contributors. Enjoy!
Sep 19, 2025: 💃 We introduct Wan2.2-Animate-14B, an unified model for character animation and replacement with holistic movement and expression replication. We released the model weights and inference code. And you can try it on wan.video, ModelScope Studio or HuggingFace Space!
Aug 26, 2025: 🎵 We introduce Wan2.2-S2V-14B, an audio-driven cinematic video generation model, including inference code, model weights, and technical report! Now you can try it on wan.video, ModelScope Gradio or HuggingFace Gradio!
Jul 28, 2025: 👋 We have open a HF space using the TI2V-5B model. Enjoy!
Jul 28, 2025: 👋 Wan2.2 has been integrated into ComfyUI (CN | EN). Enjoy!
Jul 28, 2025: 👋 Wan2.2's T2V, I2V and TI2V have been integrated into Diffusers (T2V-A14B | I2V-A14B | TI2V-5B). Feel free to give it a try!
Jul 28, 2025: 👋 We've released the inference code and model weights of Wan2.2.
Sep 5, 2025: 👋 We add text-to-speech synthesis support with CosyVoice for Speech-to-Video generation task.

Community Works

If your research or project builds upon Wan2.1 or Wan2.2, and you would like more people to see it, please inform us.

Prompt Relay, a plug-and-play, inference-time method for temporal control in video generation. Prompt Relay improves video quality and gives users precise control over what happens at each moment in the video. Visit their webpage for more details.
Helios, a breakthrough video generation model base on Wan2.1 that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details.
LightX2V, a lightweight and efficient video generation framework that integrates Wan2.1 and Wan2.2, supporting multiple engineering acceleration techniques for fast inference. LightX2V-HuggingFace, offers a variety of Wan-based step-distillation models, quantized models, and lightweight VAE models.
HuMo proposed a unified, human-centric framework based on Wan to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. Visit their webpage for more details.
FastVideo includes distilled Wan models with sparse attention that significanly speed up the inference time.
Cache-dit offers Fully Cache Acceleration support for Wan2.2 MoE with DBCache, TaylorSeer and Cache CFG. Visit their example for more details.
Kijai's ComfyUI WanVideoWrapper is an alternative implementation of Wan models for ComfyUI. Thanks to its Wan-only focus, it's on the frontline of getting cutting edge optimizations and hot research features, which are often hard to integrate into ComfyUI quickly due to its more rigid structure.
DiffSynth-Studio provides comprehensive support for Wan 2.2, including low-GPU-memory layer-by-layer offload, FP8 quantization, sequence parallelism, LoRA training, full training.

📑 Todo List

Wan2.2 Text-to-Video
- [x] Multi-GPU Inference code of the A14B and 14B models
- [x] Checkpoints of the A14B and 14B models
- [x] ComfyUI integration
- [x] Diffusers integration
Wan2.2 Image-to-Video
- [x] Multi-GPU Inference code of the A14B model
- [x] Checkpoints of the A14B model
- [x] ComfyUI integration
- [x] Diffusers integration
Wan2.2 Text-Image-to-Video
- [x] Multi-GPU Inference code of the 5B model
- [x] Checkpoints of the 5B model
- [x] ComfyUI integration
- [x] Diffusers integration
Wan2.2-S2V Speech-to-Video
- [x] Inference code of Wan2.2-S2V
- [x] Checkpoints of Wan2.2-S2V-14B
- [x] ComfyUI integration
- [x] Diffusers integration
Wan2.2-Animate Character Animation and Replacement
- [x] Inference code of Wan2.2-Animate
- [x] Checkpoints of Wan2.2-Animate
- [x] ComfyUI integration
- [x] Diffusers integration

Run Wan2.2

Installation

Clone the repo:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2

Install dependencies:

# Ensure torch >= 2.4.0
# If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last
pip install -r requirements.txt
# If you want to use CosyVoice to synthesize speech for Speech-to-Video Generation, please install requirements_s2v.txt additionally
pip install -r requirements_s2v.txt

Model Download

Models	Download Links	Description
T2V-A14B	🤗 Huggingface 🤖 ModelScope	Text-to-Video MoE model, supports 480P & 720P
I2V-A14B	🤗 Huggingface 🤖 ModelScope	Image-to-Video MoE model, supports 480P & 720P
TI2V-5B	🤗 Huggingface 🤖 ModelScope	High-compression VAE, T2V+I2V, supports 720P
S2V-14B	🤗 Huggingface 🤖 ModelScope	Speech-to-Video model, supports 480P & 720P
Animate-14B	🤗 Huggingface 🤖 ModelScope	Character animation and replacement

💡Note: The TI2V-5B model supports 720P video generation at 24 FPS.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

Download models using modelscope-cli:

pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B

Run Text-to-Video Generation

This repository supports the Wan2.2-T2V-A14B Text-to-Video model and can simultaneously support video generation at 480P and 720P resolutions.

(1) Without Prompt Extension

To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step.

Single-GPU inference

python generate.py  --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --offload_model True --convert_model_dtype --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

💡 This command can run on a GPU with at least 80GB VRAM.

💡If you encounter OOM (Out-of-Memory) issues, you can use the --offload_model True, --convert_model_dtype and --t5_cpu options to reduce GPU memory usage.

Multi-GPU inference using FSDP + DeepSpeed Ulysses

We use PyTorch FSDP and DeepSpeed Ulysses to accelerate inference.

torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

(2) Using Prompt Extension

Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension:

Use the Dashscope API for extension.
Apply for a dashscope.api_key in advance (EN | CN).
Configure the environment variable DASH_API_KEY to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable DASH_API_URL to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document.
Use the qwen-plus model for text-to-video tasks and qwen-vl-max for image-to-video tasks.
You can modify the model used for extension with the parameter --prompt_extend_model. For example: ```sh DASH_API_KEY=your_key torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --promp

Core symbols most depended-on inside this repo

called by 170

wan/modules/animate/animate_utils.py

size

called by 98

wan/modules/animate/animate_utils.py

device

called by 35

wan/modules/animate/animate_utils.py

squeeze

called by 26

wan/modules/animate/animate_utils.py

encode

called by 16

wan/modules/vae2_2.py

get_world_size

called by 15

wan/distributed/util.py

resize

called by 14

wan/modules/animate/preprocess/pose2d_utils.py

flash_attention

called by 13

wan/modules/attention.py

Shape

Method 391

Function 169

Class 118

Languages

Python100%

Modules by API surface

wan/modules/vae2_2.py53 symbols

wan/modules/animate/preprocess/pose2d_utils.py44 symbols

wan/modules/animate/motion_encoder.py42 symbols

wan/modules/vae2_1.py39 symbols

wan/modules/s2v/motioner.py38 symbols

wan/modules/t5.py37 symbols

wan/modules/animate/clip.py32 symbols

wan/modules/animate/animate_utils.py30 symbols

wan/modules/s2v/model_s2v.py29 symbols

wan/modules/model.py28 symbols

wan/modules/animate/preprocess/human_visualization.py23 symbols

wan/utils/fm_solvers.py22 symbols

Dependencies from manifests, versioned

accelerate1.1.1 · 1×

dashscope1×

diffusers0.31.0 · 1×

easydict1×

flash_attn1×

ftfy1×

imageio1×

imageio-ffmpeg1×

numpy1.23.5 · 1×

opencv-python4.9.0.80 · 1×

tokenizers0.20.3 · 1×

torch2.4.0 · 1×

For agents

$ claude mcp add Wan2.2 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact