[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.
![]()
UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
   📑 <a href="https://arxiv.org/abs/2501.12326">Paper</a>   
| 🤗 <a href="https://huggingface.co/ByteDance-Seed/UI-TARS-1.5-7B">Hugging Face Models</a>  
|   🫨 <a href="https://discord.gg/pTXwYVjfcs">Discord</a>  
|   🤖 <a href="https://www.modelscope.cn/collections/UI-TARS-bccb56fa1ef640">ModelScope</a>  
🖥️ Desktop Application   
|    👓 Midscene (use in browser)   
|   
| Instruction | Video |
|---|---|
| Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | |
| Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? |
See Quick Start.
See Deployment.
See CONTRIBUTING.md.
See @ui-tars/sdk
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
$ claude mcp add UI-TARS-desktop \
-- python -m otcore.mcp_server <graph>