<img src="https://github.com/Wan-Video/Wan2.2/raw/main/assets/logo.png" width="400"/>
💜 <a href="https://wan.video"><b>Wan</b></a>    |    🖥️ <a href="https://github.com/Wan-Video/Wan2.2">GitHub</a>    |   🤗 <a href="https://huggingface.co/Wan-AI/">Hugging Face</a>   |   🤖 <a href="https://modelscope.cn/organization/Wan-AI">ModelScope</a>   |    📑 <a href="https://arxiv.org/abs/2503.20314">Paper</a>    |    📑 <a href="https://wan.video/welcome?spm=a2ty_o02.30011076.0.0.6c9ee41eCcluqg">Blog</a>    |    💬 <a href="https://discord.gg/AKNgpMK4Yj">Discord</a>  
📕 <a href="https://alidocs.dingtalk.com/i/nodes/jb9Y4gmKWrx9eo4dCql9LlbYJGXn6lpz">使用指南(中文)</a>   |    📘 <a href="https://alidocs.dingtalk.com/i/nodes/EpGBa2Lm8aZxe5myC99MelA2WgN7R35y">User Guide(English)</a>   |   💬 <a href="https://gw.alicdn.com/imgextra/i2/O1CN01tqjWFi1ByuyehkTSB_!!6000000000015-0-tps-611-1279.jpg">WeChat(微信)</a>  
Wan: Open and Advanced Large-Scale Video Generative Models
We are excited to introduce Wan2.2, a major upgrade to our foundational video models. With Wan2.2, we have focused on incorporating the following innovations:
👍 Effective MoE Architecture: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.
👍 Cinematic-level Aesthetics: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.
👍 Complex Motion Generation: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.
👍 Efficient High-Definition Hybrid TI2V: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.
Nov 13, 2025: 👋 Wan2.2-Animate-14B has been integrated into Diffusers (PR,Weights). Thanks to all community contributors. Enjoy!
Sep 19, 2025: 💃 We introduct Wan2.2-Animate-14B, an unified model for character animation and replacement with holistic movement and expression replication. We released the model weights and inference code. And you can try it on wan.video, ModelScope Studio or HuggingFace Space!
If your research or project builds upon Wan2.1 or Wan2.2, and you would like more people to see it, please inform us.
Clone the repo:
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
Install dependencies:
# Ensure torch >= 2.4.0
# If the installation of `flash_attn` fails, try installing the other packages first and install `flash_attn` last
pip install -r requirements.txt
# If you want to use CosyVoice to synthesize speech for Speech-to-Video Generation, please install requirements_s2v.txt additionally
pip install -r requirements_s2v.txt
| Models | Download Links | Description |
|---|---|---|
| T2V-A14B | 🤗 Huggingface 🤖 ModelScope | Text-to-Video MoE model, supports 480P & 720P |
| I2V-A14B | 🤗 Huggingface 🤖 ModelScope | Image-to-Video MoE model, supports 480P & 720P |
| TI2V-5B | 🤗 Huggingface 🤖 ModelScope | High-compression VAE, T2V+I2V, supports 720P |
| S2V-14B | 🤗 Huggingface 🤖 ModelScope | Speech-to-Video model, supports 480P & 720P |
| Animate-14B | 🤗 Huggingface 🤖 ModelScope | Character animation and replacement |
💡Note: The TI2V-5B model supports 720P video generation at 24 FPS.
Download models using huggingface-cli:
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-T2V-A14B --local-dir ./Wan2.2-T2V-A14B
Download models using modelscope-cli:
pip install modelscope
modelscope download Wan-AI/Wan2.2-T2V-A14B --local_dir ./Wan2.2-T2V-A14B
This repository supports the Wan2.2-T2V-A14B Text-to-Video model and can simultaneously support video generation at 480P and 720P resolutions.
To facilitate implementation, we will start with a basic version of the inference process that skips the prompt extension step.
python generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --offload_model True --convert_model_dtype --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
💡 This command can run on a GPU with at least 80GB VRAM.
💡If you encounter OOM (Out-of-Memory) issues, you can use the
--offload_model True,--convert_model_dtypeand--t5_cpuoptions to reduce GPU memory usage.
We use PyTorch FSDP and DeepSpeed Ulysses to accelerate inference.
torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
Extending the prompts can effectively enrich the details in the generated videos, further enhancing the video quality. Therefore, we recommend enabling prompt extension. We provide the following two methods for prompt extension:
dashscope.api_key in advance (EN | CN).DASH_API_KEY to specify the Dashscope API key. For users of Alibaba Cloud's international site, you also need to set the environment variable DASH_API_URL to 'https://dashscope-intl.aliyuncs.com/api/v1'. For more detailed instructions, please refer to the dashscope document.qwen-plus model for text-to-video tasks and qwen-vl-max for image-to-video tasks.--prompt_extend_model. For example:
```sh
DASH_API_KEY=your_key torchrun --nproc_per_node=8 generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --dit_fsdp --t5_fsdp --ulysses_size 8 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage" --use_prompt_extend --prompt_extend_method 'dashscope' --promp$ claude mcp add Wan2.2 \
-- python -m otcore.mcp_server <graph>