Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang

Xinyuan Wang @xywang626

Aug 15

We are super excited to release OpenCUA — the first from 0 to 1 computer-use agent foundation model framework and open-source SOTA model OpenCUA-32B, matching top proprietary models on OSWorld-Verified, with full infrastructure and data. 🔗 [Paper] arxiv.org/abs/2508.09123 📌 [Website] opencua.xlang.ai/ 🤖 [Models] huggingface.co/xlangai/OpenC… 📊[Data] huggingface.co/datasets/xlan… 💻 [Code] github.com/xlang-ai/OpenCUA 🌟 OpenCUA — comprehensive open-source framework for computer-use agents, including: 📊 AgentNet — first large-scale CUA dataset (3 systems, 200+ apps & sites, 22.6K trajectories) 🏆 OpenCUA model — open-source SOTA on OSWorld-Verified (34.8% avg success, outperforms OpenAI CUA) 🖥 AgentNetTool — cross-system computer-use task annotation tool 🏁 AgentNetBench — offline CUA benchmark for fast, reproducible evaluation 💡 Why OpenCUA? Proprietary CUAs like Claude or OpenAI CUA are impressive🤯 — but there’s no large-scale open desktop agent dataset or transparent pipeline. OpenCUA changes that by offering the full open-source stack 🛠: scalable cross-system data collection, effective data formulation, model training strategy, and reproducible evaluation — powering top open-source models including OpenCUA-7B and OpenCUA-32B that excel in GUI planning & grounding. Details of OpenCUA framework👇

Aug 15, 2025 · 4:59 PM UTC

100

466

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

AgentNet Tool 🖥— a computer-use task annotation application for Windows/macOS/Ubuntu, capturing screen videos, mouse/keyboard events, and metadata for scalable real-world computer-use data. It can automatically record the task without interrupting the user workflow.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

AgentNet dataset 📂— 22,625 human-annotated computer-use tasks spanning Windows, macOS, and Ubuntu, covering 140+ apps & 190+ websites, with multi-app workflows, professional tools, and uncommon features, and an average trajectory length of 18.6.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

Data modeling 💡: Incorporates a reflective long CoT — a novel pipeline that enriches each task step with reflection, planning, and memory. The generator and reflector iteratively produce and verify reasoning between the observation and ground-truth actions, enhancing the model’s ability to perceive and recover from errors.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

OpenCUA models 🚀: OpenCUA-7B and OpenCUA-32B are strong open-source foundation models for computer use. In particular, OpenCUA-32B achieves 55.3% on Screenspot-Pro and 34.8% on OSWorld-Verified (SOTA among open-source models)👇

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

AgentNetBench 🏁— stable, fast, environment-free offline benchmark with 100 diverse representative tasks, covering Windows and macOS platforms and diverse domains. Each task is manually provided with multiple valid action options at each step.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

Insight 1: OpenCUA offers a scalable solution to strong computer-use foundation models, with AgentNet contributing 22K tasks to the open-source community. The performance ceiling of SFT-based CUAs is still far from reached.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

Insight 2: Same-domain (computer system) training delivers the largest gains in that environment. Cross-domain transfer shows a performance gap but is still beneficial.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

Insight 3: Huge gap exists between Pass@1 and Pass@16 performance on OSWorld using OpenCUA-Qwen2-7B. Surprisingly, Pass@3 performance boosts OpenCUA-32B from 34.2% → 45.6%! 📈 This large margin suggests ample headroom for future post-training, reranking, or multi-agent methods.

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

🙌 Thanks to all the authors — @xywang626 , @BowenWangNLP , @DunjieLu1219 , @junlin45300 , @TianbaoX , @JunliWang2021 , @jiaqideng07 , @gxlvera , @yihengxu_ , @ChenHenryWu , @ZShen0521 , Zhuokai Li, @RyanLi0802 , @xiaochuanlee , Junda Chen, Boyuan Zheng, Peihang Li, @fangyu_lei , @RuishengC49326 , Yeqiao Fu, @dcshin718, Martin Shin, Jiarui Hu, Yuyan Wang, @chenjx210734 , Yuxiao Ye, @_zdy023 , @dikang_du, @Mouse_Hu, Huarong Chen, Zaida Zhou, Haotian Yao, Ziwei Chen, Qizheng Gu, Yipu Wang, @HengWang_xjtu, @Diyi_Yang , @hllo_wrld, @RotekSong, Y. Charles, Zhilin Yang, and @taoyds .

Xinyuan Wang · Aug 15, 2025 · 4:59 PM UTC

Xinyuan Wang @xywang626

Aug 15

🙌 Acknowledgement: We thank @ysu_nlp, @CaimingXiong , and the anonymous reviewers for their insightful discussions and valuable feedback. We are grateful to Moonshot AI for providing training infrastructure and annotated data. We also sincerely appreciate Jin Zhang, Hao Yang, Zhengtao Wang, and Yanxu Chen from the Kimi Team for their strong infrastructure support and helpful guidance. The development of our tool is based on the open-source projects DuckTrack @arankomatsuzaki and @OpenAdaptAI we are very grateful for their commitment to the open-source community. Finally, we extend our deepest thanks to all annotators for their tremendous effort and contributions to this project. ❤️

Mahaoo · Aug 17, 2025 · 9:14 AM UTC

Mahaoo

@mahaoo_ASI

Aug 17

Replying to @xywang626

dataset data collection tool models evals this is how a true open source release in the AI era looks like!

Xinyuan Wang · Aug 17, 2025 · 4:03 PM UTC

Xinyuan Wang @xywang626

Aug 17

Thank you!🥰

Sean Mc · Aug 15, 2025 · 5:53 PM UTC

Sean Mc

@Kavalan1212

Aug 15

Replying to @xywang626 @arankomatsuzaki

Congrats Xinyuan.. I have to ask @grok for a simple explanation on what this does..

Xinyuan Wang · Aug 16, 2025 · 4:34 PM UTC

Xinyuan Wang @xywang626

Aug 16

Thanks~ It is an agent model that can operate mouse and keyboard to do tasks on computer. There are some small demos in our website: opencua.xlang.ai/

more replies

Abdarahmane Traore · Aug 15, 2025 · 8:47 PM UTC

Abdarahmane Traore

@abtraore_

Aug 15

Replying to @xywang626 @ClementDelangue

Paper✅, data✅, Weights✅,True OSS ✅ Congrats 🎉

Xinyuan Wang · Aug 16, 2025 · 4:31 PM UTC

Xinyuan Wang @xywang626

Aug 16

Thank you!🥰

Nirmal Krishnan · Aug 15, 2025 · 6:06 PM UTC

Nirmal Krishnan

@0xnirmal

Aug 15

Replying to @xywang626

super cool - congrats! how do you attempt to ensure the annotators explore the latent space sufficiency? were the instructions prescribed or to were they just giving guard rails related to supported applications and left to "run wild"?

snubeaver (mainnet arc) · Aug 26, 2025 · 8:31 PM UTC

snubeaver (mainnet arc)

@snubeaver

Aug 26

Replying to @xywang626

Great work guys. Especially considering that more benchmark datasets are needed in the CUA area 👍

dazzafact · Aug 18, 2025 · 8:25 AM UTC

dazzafact

@calhim7

Aug 18

Replying to @xywang626

@grok what does that the benchmark mean for the use of this model, compare to other?

Kelly Ann Gonzales-Dodd · Sep 19, 2025 · 3:09 PM UTC

Kelly Ann Gonzales-Dodd

@kellyanng_dodd

Sep 19

Replying to @xywang626

That’s huge! Congratulations to the team on this release! 🚀

BeastTitanHunter · Aug 19, 2025 · 12:51 AM UTC

BeastTitanHunter @Titan1Beast

Aug 19

Replying to @xywang626

can you model rent gpu's from a website like runpod and fine tune an llm model?

Vincent Mazzoli · Aug 30, 2025 · 9:43 AM UTC

Vincent Mazzoli @mazzzollli

Aug 30

Replying to @xywang626

I think the people in China and the people pf the United States. need to tell our leaders we will not allow a senseless war that will only line @rothschild befits from. Great work man