hub / github.com/OpenTalker/SadTalker

github.com/OpenTalker/SadTalker @v0.0.2 sqlite

repository ↗ · DeepWiki ↗ · release v0.0.2 ↗

650 symbols 1,786 edges 102 files 176 documented · 27%

README

<a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
<a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
<a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp;
<a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
<a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp;


<a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp;
<a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp;
<a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp;









<sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp;

CVPR 2023

sadtalker

TL;DR: single portrait image 🙎‍♂️ + audio 🎤 = talking head video 🎞.

🔥 Highlight

🔥 The extension of the stable-diffusion-webui is online. Just install it in extensions -> install from URL -> https://github.com/Winfredy/SadTalker, checkout more details here.

https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4

🔥 Beta version of the full image mode is online! checkout here for more details.

still	still + enhancer	input image @bagbag1815

🔥 Several new mode, eg, still mode, reference mode, resize mode are online for better and custom applications.
🔥 Happy to see our method is used in various talking or singing avatar, checkout these wonderful demos at bilibili and twitter #sadtalker.

📋 Changelog (Previous changelog can be founded here)

[2023.04.06]: stable-diffiusion webui extension is release.
[2023.04.03]: Enable TTS in huggingface and gradio local demo.
[2023.03.30]: Launch beta version of the full body mode.
[2023.03.30]: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement.
[2023.03.29]: resize mode is online by python infererence.py --preprocess resize! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35.
[2023.03.29]: local gradio demo is online! python app.py to start the demo. New requirments.txt is used to avoid the bugs in librosa.
[2023.03.28]: Online demo is launched in , thanks AK!

🎼 Pipeline

main_of_sadtalker

Our method uses the coefficients of 3DMM as intermediate motion representation. To this end, we first generate realistic 3D motion coefficients (facial expression β, head pose ρ) from audio, then these coefficients are used to implicitly modulate the 3D-aware face render for final video generation.

🚧 TODO

Previous TODOs

[x] Generating 2D face from a single Image.
[x] Generating 3D face from Audio.
[x] Generating 4D free-view talking examples from audio and a single image.
[x] Gradio/Colab Demo.
[x] Full body/image Generation.
[ ] training code of each componments.
[ ] Audio-driven Anime Avatar.
[ ] interpolate ChatGPT for a conversation demo 🤔
[x] integrade with stable-diffusion-web-ui. (stay tunning!)

⚙️ Installation

Installing Sadtalker on Linux:

git clone https://github.com/Winfredy/SadTalker.git

cd SadTalker 

conda create -n sadtalker python=3.8

conda activate sadtalker

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install ffmpeg

pip install -r requirements.txt

### tts is optional for gradio demo. 
### pip install TTS

More tips about installnation on Windows and the Docker file can be founded here

Sd-Webui-Extension:

CLICK ME

Installing the lastest version of stable-diffusion-webui and install the sadtalker via extension.

Then, retarting the stable-diffusion-webui, set some commandline args. The models will be downloaded automatically in the right place. Alternatively, you can add the path of pre-downloaded sadtalker checkpoints to SADTALKTER_CHECKPOINTS in webui_user.sh(linux) or webui_user.bat(windows) by:

# windows (webui_user.bat)
set COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
set SADTALKER_CHECKPOINTS=D:\SadTalker\checkpoints

# linux (webui_user.sh)
export COMMANDLINE_ARGS=--no-gradio-queue  --disable-safe-unpickle
export SADTALKER_CHECKPOINTS=/path/to/SadTalker/checkpoints

After installation, the SadTalker can be used in stable-diffusion-webui directly.

Download Trained Models

CLICK ME

You can run the following script to put all the models in the right place.

bash scripts/download_models.sh

OR download our pre-trained model from google drive or our github release page, and then, put it in ./checkpoints.

Model	Description
checkpoints/auido2exp_00300-model.pth	Pre-trained ExpNet in Sadtalker.
checkpoints/auido2pose_00140-model.pth	Pre-trained PoseVAE in Sadtalker.
checkpoints/mapping_00229-model.pth.tar	Pre-trained MappingNet in Sadtalker.
checkpoints/facevid2vid_00189-model.pth.tar	Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
checkpoints/epoch_20.pth	Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
checkpoints/wav2lip.pth	Highly accurate lip-sync model in Wav2lip.
checkpoints/shape_predictor_68_face_landmarks.dat	Face landmark model used in dilb.
checkpoints/BFM	3DMM library file.
checkpoints/hub	Face detection models used in face alignment.

🔮 Quick Start

Generating 2D face from a single Image from default config.

python inference.py --driven_audio <audio.wav> --source_image <video.mp4 or picture.png>

The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

Or a local gradio demo similar to our hugging-face demo can be run by:


## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.

python app.py

Advanced Configuration

Click Me

Name	Configuration	default	Explaination
Enhance Mode	`--enhancer`	None	Using `gfpgan` or `RestoreFormer` to enhance the generated face via face restoration network
Still Mode	`--still`	False	Using the same pose parameters as the original image, fewer head motion.
Expressive Mode	`--expression_scale`	1.0	a larger value will make the expression motion stronger.
save path	`--result_dir`	`./results`	The file will be save in the newer location.
preprocess	`--preprocess`	`crop`	Run and produce the results in the croped input image. Other choices: `resize`, where the images will be resized to the specific resolution.
ref Mode (eye)	`--ref_eyeblink`	None	A video path, where we borrow the eyeblink from this reference video to provide more natural eyebrow movement.
ref Mode (pose)	`--ref_pose`	None	A video path, where we borrow the pose from the head reference video.
3D Mode	`--face3dvis`	False	Need additional installation. More details to generate the 3d face can be founded here.
free-view Mode	`--input_yaw`,

--input_pitch,

--input_roll | None | Genearting novel view or free-view 4D talking head from a single image. More details can be founded here.

Examples

basic	w/ still mode	w/ exp_scale 1.3	w/ gfpgan

> Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.

Input, w/ reference video , reference video

If the reference video is shorter than the input audio, we will loop the reference video .

Generating 3D face from Audio

Input	Animated 3d face

Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.

Generating 4D free-view talking examples from audio and a single image

We use input_yaw, input_pitch, input_roll to control head pose. For example, --input_yaw -20 30 10 means the input head yaw degree changes from -20 to 30 and then changes from 30 to 10.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --input_yaw -20 30 10

Results, Free-view results, Novel view results

[Beta Application] Full body/image Generation

Now, you can use --still to generate a natural full body video. You can add enhancer or full_img_enhancer to improve the quality of the generated video. However, if you add other mode, such as ref_eyeblinking, ref_pose, the result will be bad. We are still trying to fix this problem.

python inference.py --driven_audio <audio.wav> \
                    --source_image <video.mp4 or picture.png> \
                    --result_dir <a file to store results> \
                    --still \
                    --enhancer gfpgan

🛎 Citation

If you find our work useful in your research, please consider citing:

```bibtex @article{zhang2022sadtalker, title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation}, author={Zhang, Wenxuan and Cun, Xiaodong an

Core symbols most depended-on inside this repo

called by 52

src/face3d/models/bfm.py

split

called by 35

src/face3d/models/arcface_torch/eval/verification.py

eval

called by 18

src/face3d/models/base_model.py

save

called by 11

src/face3d/util/html.py

conv1x1

called by 10

src/face3d/models/networks.py

_resnet

called by 9

src/face3d/models/networks.py

get

called by 6

src/facerender/sync_batchnorm/comm.py

transform

called by 6

src/face3d/models/bfm.py

Shape

Method 344

Function 190

Class 116

Languages

Python100%

Modules by API surface

src/facerender/modules/util.py56 symbols

src/face3d/models/networks.py35 symbols

src/face3d/models/base_model.py26 symbols

src/face3d/models/arcface_torch/backbones/mobilefacenet.py22 symbols

src/utils/audio.py19 symbols

src/face3d/models/arcface_torch/dataset.py19 symbols

src/audio2pose_models/networks.py19 symbols

src/face3d/models/bfm.py18 symbols

src/face3d/models/arcface_torch/backbones/iresnet.py15 symbols

src/facerender/sync_batchnorm/batchnorm.py14 symbols

src/facerender/sync_batchnorm/comm.py13 symbols

src/face3d/util/util.py13 symbols

Dependencies from manifests, versioned

basicsr1.4.2 · 1×

face_alignment1.3.5 · 1×

facexlib0.2.5 · 1×

imageio2.19.3 · 1×

imageio-ffmpeg0.4.7 · 1×

joblib1.1.0 · 1×

kornia0.6.8 · 1×

librosa0.9.2 · 1×

numpy1.23.4 · 1×

pydub0.25.1 · 1×

resampy0.3.1 · 1×

scikit-image0.19.3 · 1×

For agents

$ claude mcp add SadTalker \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact