We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] arxiv.org/abs/2508.09123 📌 [Website] opencua.xlang.ai/ 🤖 [Models] huggingface.co/xlangai/OpenC… 📊[Data] huggingface.co/datasets/xlan… 💻 [Code] github.com/xlang-ai/OpenCUA 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

Aug 15, 2025 · 4:59 PM UTC

AgentNet Tool 🖥— a computer-use task annotation application for Windows/macOS/Ubuntu, capturing screen videos, mouse/keyboard events, and metadata for scalable real-world computer-use data. It can automatically record the task without interrupting the user workflow.
2
4
27
AgentNet dataset 📂— 22,625 human-annotated computer-use tasks spanning Windows, macOS, and Ubuntu, covering 140+ apps & 190+ websites, with multi-app workflows, professional tools, and uncommon features, and an average trajectory length of 18.6.
1
2
22
Data modeling 💡: Incorporates a reflective long CoT — a novel pipeline that enriches each task step with reflection, planning, and memory. The generator and reflector iteratively produce and verify reasoning between the observation and ground-truth actions, enhancing the model’s ability to perceive and recover from errors.
1
1
13
OpenCUA models 🚀: OpenCUA-7B and OpenCUA-32B are strong open-source foundation models for computer use. In particular, OpenCUA-32B achieves 55.3% on Screenspot-Pro and 34.8% on OSWorld-Verified (SOTA among open-source models)👇
1
1
1
18
AgentNetBench 🏁— stable, fast, environment-free offline benchmark with 100 diverse representative tasks, covering Windows and macOS platforms and diverse domains. Each task is manually provided with multiple valid action options at each step.
1
3
16
Insight 1: OpenCUA offers a scalable solution to strong computer-use foundation models, with AgentNet contributing 22K tasks to the open-source community. The performance ceiling of SFT-based CUAs is still far from reached.
1
1
11
Insight 2: Same-domain (computer system) training delivers the largest gains in that environment. Cross-domain transfer shows a performance gap but is still beneficial.
1
10
Insight 3: Huge gap exists between Pass@1 and Pass@16 performance on OSWorld using OpenCUA-Qwen2-7B. Surprisingly, Pass@3 performance boosts OpenCUA-32B from 34.2% → 45.6%! 📈 This large margin suggests ample headroom for future post-training, reranking, or multi-agent methods.
1
7
🙌 Thanks to all the authors — @xywang626 , @BowenWangNLP , @DunjieLu1219 , @junlin45300 , @TianbaoX , @JunliWang2021 , @jiaqideng07 , @gxlvera , @yihengxu_ , @ChenHenryWu , @ZShen0521 , Zhuokai Li, @RyanLi0802 , @xiaochuanlee , Junda Chen, Boyuan Zheng, Peihang Li, @fangyu_lei , @RuishengC49326 , Yeqiao Fu, @dcshin718, Martin Shin, Jiarui Hu, Yuyan Wang, @chenjx210734 , Yuxiao Ye, @_zdy023 , @dikang_du, @Mouse_Hu, Huarong Chen, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, @HengWang_xjtu, @Diyi_Yang , @hllo_wrld, @RotekSong, Y. Charles, Zhilin Yang, and @taoyds .
1
1
13
🙌 Acknowledgement: We thank @ysu_nlp, @CaimingXiong , and the anonymous reviewers for their insightful discussions and valuable feedback. We are grateful to Moonshot AI for providing training infrastructure and annotated data. We also sincerely appreciate Jin Zhang, Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance. The development of our tool is based on the open-source projects DuckTrack @arankomatsuzaki and @OpenAdaptAI we are very grateful for their commitment to the open-source community. Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project. ❤️
3
19
Replying to @xywang626
dataset data collection tool models evals this is how a true open source release in the AI era looks like!
1
1
10
Thank you!🥰
3
Congrats Xinyuan.. I have to ask @grok for a simple explanation on what this does..
2
2
Thanks~ It is an agent model that can operate mouse and keyboard to do tasks on computer. There are some small demos in our website: opencua.xlang.ai/
1
3
Paper✅, data✅, Weights✅,True OSS ✅ Congrats 🎉
1
1
Thank you!🥰
2
Replying to @xywang626
super cool - congrats! how do you attempt to ensure the annotators explore the latent space sufficiency? were the instructions prescribed or to were they just giving guard rails related to supported applications and left to "run wild"?
1
1
Replying to @xywang626
Great work guys. Especially considering that more benchmark datasets are needed in the CUA area 👍
Replying to @xywang626
@grok what does that the benchmark mean for the use of this model, compare to other?
Replying to @xywang626
That’s huge! Congratulations to the team on this release! 🚀
Replying to @xywang626
can you model rent gpu's from a website like runpod and fine tune an llm model?
Replying to @xywang626
I think the people in China and the people pf the United States. need to tell our leaders we will not allow a senseless war that will only line @rothschild befits from. Great work man