MCPcopy
hub / github.com/Wan-Video/Wan2.2

github.com/Wan-Video/Wan2.2 @main sqlite

repository ↗ · DeepWiki ↗
678 symbols 2,145 edges 57 files 152 documented · 22%
README

Wan2.2

<img src="https://github.com/Wan-Video/Wan2.2/raw/main/assets/logo.png" width="400"/>







💜 <a href="https://wan.video"><b>Wan</b></a> &nbsp&nbsp | &nbsp&nbsp 🖥️ <a href="https://github.com/Wan-Video/Wan2.2">GitHub</a> &nbsp&nbsp  | &nbsp&nbsp🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://arxiv.org/abs/2503.20314">Paper</a> &nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a> &nbsp&nbsp |  &nbsp&nbsp 💬  <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>&nbsp&nbsp



📕 <a href="https://alidocs.dingtalk.com/i/nodes/jb9Y4gmKWrx9eo4dCql9LlbYJGXn6lpz">使用指南(中文)</a>&nbsp&nbsp | &nbsp&nbsp 📘 <a href="https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y">User Guide(English)</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat(微信)</a>&nbsp&nbsp

Wan: Open and Advanced Large-Scale Video Generative Models

We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations:

  • 👍 Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

  • 👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

  • 👍 Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

  • 👍 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

Video Demos

🔥 Latest News!!

Community Works

If your research or project builds upon Wan2.1 or Wan2.2, and you would like more people to see it, please inform us.

  • Prompt Relay, a plug-and-play, inference-time method for temporal control in video generation. Prompt Relay improves video quality and gives users precise control over what happens at each moment in the video. Visit their webpage for more details.
  • Helios, a breakthrough video generation model base on Wan2.1 that achieves minute-scale, high-quality video synthesis at 19.5 FPS on a single H100 GPU (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques. Visit their webpage for more details.
  • LightX2V, a lightweight and efficient video generation framework that integrates Wan2.1 and Wan2.2, supporting multiple engineering acceleration techniques for fast inference. LightX2V-HuggingFace, offers a variety of Wan-based step-distillation models, quantized models, and lightweight VAE models.
  • HuMo proposed a unified, human-centric framework based on Wan to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. Visit their webpage for more details.
  • FastVideo includes distilled Wan models with sparse attention that significanly speed up the inference time.
  • Cache-dit offers Fully Cache Acceleration support for Wan2.2 MoE with DBCache, TaylorSeer and Cache CFG. Visit their example for more details.
  • Kijai's ComfyUI WanVideoWrapper is an alternative implementation of Wan models for ComfyUI. Thanks to its Wan-only focus, it's on the frontline of getting cutting edge optimizations and hot research features, which are often hard to integrate into ComfyUI quickly due to its more rigid structure.
  • DiffSynth-Studio provides comprehensive support for Wan 2.2, including low-GPU-memory layer-by-layer offload, FP8 quantization, sequence parallelism, LoRA training, full training.

📑 Todo List

  • Wan2.2 Text-to-Video
    • [x] Multi-GPU Inference code of the A14B and 14B models
    • [x] Checkpoints of the A14B and 14B models
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2 Image-to-Video
    • [x] Multi-GPU Inference code of the A14B model
    • [x] Checkpoints of the A14B model
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2 Text-Image-to-Video
    • [x] Multi-GPU Inference code of the 5B model
    • [x] Checkpoints of the 5B model
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2-S2V Speech-to-Video
    • [x] Inference code of Wan2.2-S2V
    • [x] Checkpoints of Wan2.2-S2V-14B
    • [x] ComfyUI integration
    • [x] Diffusers integration
  • Wan2.2-Animate Character Animation and Replacement
    • [x] Inference code of Wan2.2-Animate
    • [x] Checkpoints of Wan2.2-Animate
    • [x] ComfyUI integration
    • [x] Diffusers integration

Run Wan2.2

Installation

Clone the repo:

git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2

Install dependencies:

# Ensure torch >= 2.4.0
# If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last
pip install -r requirements.txt
# If you want to use CosyVoice to synthesize speech for Speech-to-Video Generation, please install requirements_s2v.txt additionally
pip install -r requirements_s2v.txt

Model Download

Models Download Links Description
T2V-A14B 🤗 Huggingface 🤖 ModelScope Text-to-Video MoE model, supports 480P & 720P
I2V-A14B 🤗 Huggingface 🤖 ModelScope Image-to-Video MoE model, supports 480P & 720P
TI2V-5B 🤗 Huggingface 🤖 ModelScope High-compression VAE, T2V+I2V, supports 720P
S2V-14B 🤗 Huggingface 🤖 ModelScope Speech-to-Video model, supports 480P & 720P
Animate-14B 🤗 Huggingface 🤖 ModelScope Character animation and replacement

💡Note: The TI2V-5B model supports 720P video generation at 24 FPS.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B

Download models using modelscope-cli:

pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B

Run Text-to-Video Generation

This repository supports the Wan2.2-T2V-A14B Text-to-Video model and can simultaneously support video generation at 480P and 720P resolutions.

(1) Without Prompt Extension

To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step.

  • Single-GPU inference
python generate.py  --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --offload_model True --convert_model_dtype --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

💡 This command can run on a GPU with at least 80GB VRAM.

💡If you encounter OOM (Out-of-Memory) issues, you can use the --offload_model True, --convert_model_dtype and --t5_cpu options to reduce GPU memory usage.

  • Multi-GPU inference using FSDP + DeepSpeed Ulysses

We use PyTorch FSDP and DeepSpeed Ulysses to accelerate inference.

torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
(2) Using Prompt Extension

Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension:

  • Use the Dashscope API for extension.
  • Apply for a dashscope.api_key in advance (EN | CN).
  • Configure the environment variable DASH_API_KEY to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable DASH_API_URL to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document.
  • Use the qwen-plus model for text-to-video tasks and qwen-vl-max for image-to-video tasks.
  • You can modify the model used for extension with the parameter --prompt_extend_model. For example: ```sh DASH_API_KEY=your_key torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --promp

Core symbols most depended-on inside this repo

to
called by 170
wan/modules/animate/animate_utils.py
size
called by 98
wan/modules/animate/animate_utils.py
device
called by 35
wan/modules/animate/animate_utils.py
squeeze
called by 26
wan/modules/animate/animate_utils.py
encode
called by 16
wan/modules/vae2_2.py
get_world_size
called by 15
wan/distributed/util.py
resize
called by 14
wan/modules/animate/preprocess/pose2d_utils.py
flash_attention
called by 13
wan/modules/attention.py

Shape

Method 391
Function 169
Class 118

Languages

Python100%

Modules by API surface

wan/modules/vae2_2.py53 symbols
wan/modules/animate/preprocess/pose2d_utils.py44 symbols
wan/modules/animate/motion_encoder.py42 symbols
wan/modules/vae2_1.py39 symbols
wan/modules/s2v/motioner.py38 symbols
wan/modules/t5.py37 symbols
wan/modules/animate/clip.py32 symbols
wan/modules/animate/animate_utils.py30 symbols
wan/modules/s2v/model_s2v.py29 symbols
wan/modules/model.py28 symbols
wan/modules/animate/preprocess/human_visualization.py23 symbols
wan/utils/fm_solvers.py22 symbols

Dependencies from manifests, versioned

accelerate1.1.1 · 1×
dashscope
diffusers0.31.0 · 1×
easydict
flash_attn
ftfy
imageio
imageio-ffmpeg
numpy1.23.5 · 1×
opencv-python4.9.0.80 · 1×
tokenizers0.20.3 · 1×
torch2.4.0 · 1×

For agents

$ claude mcp add Wan2.2 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact