Ashok Elluswamy · Oct 24, 2025 · 8:51 AM UTC

Ashok Elluswamy

Ryan Li retweeted

Ashok Elluswamy

@aelluswamy

Oct 24

x.com/i/article/198150149697…

410

1,883

324

7,848

Jure Leskovec · Oct 15, 2025 · 2:39 PM UTC

Ryan Li retweeted

Jure Leskovec

@jure

Oct 15

Biomni-R0-32B-Preview is now open-weight on Hugging Face! Biomni-R0-32B is biomedical AI model trained by the Biomni team. It beats GPT-5 and Claude Sonnet 4. To ground progress, we’re also releasing Biomni-Eval1 — 443 data points across 10 tasks for benchmarking agent on real-world biomedical research! Model: huggingface.co/biomni/Biomni… Eval: huggingface.co/datasets/biom…

Kexin Huang · Oct 15, 2025 · 4:00 PM UTC

Ryan Li retweeted

Kexin Huang

@KexinHuang5

Oct 15

🙌Biomni-R0-32B, the first RL agent model trained end-to-end for biology research, is now open-weight on @huggingface! To benchmark progress, we’re also releasing Biomni-Eval1 — 443 data points across 10 biomedical research tasks! 🔗 Data: huggingface.co/datasets/biom… 🔗 Model: huggingface.co/biomni/Biomni…

biomni/Eval1 · Datasets at Hugging Face

huggingface.co

214

Ryan Li · Oct 16, 2025 · 4:04 AM UTC

Ryan Li

@RyanLi0802

Oct 16

Excited to announce that Biomni-R0-32B research preview is now open weight on 🤗huggingface! 🧪 Evaluation Suite: huggingface.co/datasets/biom… 🧠 Model: huggingface.co/biomni/Biomni… We are currently on improving our environment and training recipe. Stay tuned for future updates! Thanks @KexinHuang5 @jure @ProjectBiomni for the colab!

biomni/Biomni-R0-32B-Preview · Hugging Face

huggingface.co

Tyler Griggs · Sep 2, 2025 · 6:18 PM UTC

Ryan Li retweeted

Tyler Griggs @tyler_griggs_

Sep 2

Very cool to see SkyRL powering research in biomedicine! Unsurprisingly, one of key implementation challenges was env scaling, a common theme Kudos to the Biomni team and @shiyi_c98

Kexin Huang

@KexinHuang5

Sep 2

🚀 Thrilled to share a preview of Biomni-R0 — we trained the first RL agent end-to-end for biomedical research. ➡️ nearly 2× stronger than its open-source base ➡️ >10% better than frontier closed-source models ➡️ Scalable path to hill climb to expert-level performance 🔗 Technical report: biomni.stanford.edu/blog/bio… Collab between @ProjectBiomni and SkyRL @NovaSkyAI, with the amazing @RyanLi0802 @shiyi_c98 @YuanhaoQ @jure. Open-sourcing soon!

Ryan Li · Sep 2, 2025 · 7:51 PM UTC

Ryan Li

@RyanLi0802

Sep 2

🚀 Excited to share Biomni-R0 — our state-of-the-art RL agent that excels in a wide range of biomedical tasks. It was a wild ride from writing the first agentic training loop to surpassing the performance of Claude 4 and GPT 5. However, this is only the first preview of our agentic RL training series. There is so much more to do and learn. We will be open-sourcing our first models in the next few weeks. Stay tuned for more updates!

Kexin Huang

@KexinHuang5

Sep 2

Biomni · Jun 3, 2025 · 4:59 PM UTC

Ryan Li retweeted

Biomni

@ProjectBiomni

Jun 3

Thanks to the 1,000+ users beta-testing our platform! We have heard your feedback: - You can now do history loading and continue the conversation! - Each code is now with a natural language explanation! - Fixed cross-session interference - Reduced hallucinations

Derya Unutmaz, MD · May 29, 2025 · 5:15 PM UTC

Ryan Li retweeted

Derya Unutmaz, MD

@DeryaTR_

May 29

The Biomni biomedical AI agent I was early testing has just been released, and it’s truly very impressive for an open-source system! I’m very excited about this! Kudos to everyone involved at @StanfordAILab and @arcinstitute and others for a fantastic contribution 👏

Jure Leskovec

@jure

May 29

Announcing Biomni — the first general-purpose biomedical AI agent. Biomni is a free web platform where biomedical scientists can immediately delegate their tasks to Biomni, starting today! Biomni automates literature reviews, hypothesis generation, protocol design, bioinformatics analysis, clinical reasoning, and much more — scaling biomedical expertise for 100× the number of discoveries. Key results: ➡️ Designed a cloning experiment with real-world wet-lab validation; on par with 5+ year expert in a blind test ➡️ Ran 458-file wearable bioinformatics analysis in 35 minutes vs. 3 weeks (800x faster) for human expert ➡️ Uncovered novel hypothesis: new TFs regulating skeletal lineages on a large scRNA+scATAC data ➡️ Human-level performance on LAB-bench DbQA and SeqQA, with SOTA at Humanity’s Last Exam and across 8 new biomedical tasks—ranging from GWAS and rare disease diagnosis to microbiology and drug repurposingPowered by: ➡️ Biomni-E1 – the first unified environment designed for a biomedical agent—encompassing 150 tools, 59 databases, 106 software—systematically curated from 2,500+ bioRxiv papers ➡️ Biomni-A1 – a generalist agent with retrieval, planning, and code as action Biomni is an open-source initiative: we invite the community to build on it and advance biomedical research at scale. - Try it now: biomni.stanford.edu - Paper: biomni.stanford.edu/paper.pd… - Code: github.com/snap-stanford/bio… - Join the community: tinyurl.com/biomni-slackWith Amazing team and collaborators @StanfordAILab @StanfordMed @StanfordCancer @genentech @arcinstitute @UCSF @UW @PrincetonAInews @KexinHuang5 @serena2z @hcwww_ @YuanhaoQ @mintaylu @yusufroohani @RyanLi0802 @LinQiu0128 Gavin Junze Di Shruti Jennefer Xin Zhou @MWheelerMD Jon Bernstein @MengdiWang10 @PengHeAtlas @SnyderShot @lecong Aviv Regev

Romain Lacombe · May 29, 2025 · 5:14 PM UTC

Ryan Li retweeted

Romain Lacombe @rlacombe

May 29

Exciting work just out today: 🤖🧬👉 Biomni is a general-purpose AI agent for biomedical research, brought to you by @KexinHuang5 @jure and team!

Kexin Huang

@KexinHuang5

May 29

📢 Introducing Biomni - the first general-purpose biomedical AI agent. Biomni is built on the first unified environment for biomedical agent with 150 tools, 59 databases, and 106 software packages and a generalist agent design with retrieval, planning, and code as action. This enables Biomni to perform a wide range of research tasks - from literature review, hypothesis generation, protocol design, data analysis, clinical reasoning, and much more - across subfields like genomics, microbiome, physiology, and beyond. Some key results: 🔬 Designed a molecular cloning experiment validated in wet lab, matching the performance of a >5-year expert in a blinded test 📊 Completed a wearable bioinformatics analysis across 458 messy files in 35 min vs. 3 weeks by a human 🧠 Uncovered novel transcription factor hypotheses driving skeletal lineage regulation We built a web platform where biomedical scientists can immediately delegate their tasks to the agent today, completely free! 🧪 Try it now: biomni.stanford.edu 📄 Paper: biomni.stanford.edu/paper.pd… 💻 Code: github.com/snap-stanford/bio… (will be fully open-sourced very soon!) 💬Join the community: tinyurl.com/biomni-slack Biomni is an open-source initiative: we invite the community to build on it and advance biomedical research at scale. With amazing collaborators @StanfordAILab @StanfordMed @StanfordCancer @genentech @arcinstitute @UCSF @UW @PrincetonAInews @serena2z @hcwww_ @YuanhaoQ @mintaylu @yusufroohani @RyanLi0802 @LinQiu0128 Gavin Junze Di Shruti Jennefer Xin Zhou @MWheelerMD Jon Bernstein @MengdiWang10 @PengHeAtlas @SnyderShot @lecong Aviv Regev @jure

Malay Gandhi · May 29, 2025 · 5:51 PM UTC

Ryan Li retweeted

Malay Gandhi @malayhgandhi

May 29

An exciting moment for scientists who could use some expert, intelligent support and for biomedicine overall

Kexin Huang

@KexinHuang5

May 29

Serena Zhang · May 29, 2025 · 7:57 PM UTC

Ryan Li retweeted

Serena Zhang @serena2z

May 29

now you can vibe code scientifically with biomni 🫡🫡🫡

Jure Leskovec

@jure

May 29

Yuanhao Qu · May 29, 2025 · 6:07 PM UTC

Ryan Li retweeted

Yuanhao Qu

@YuanhaoQ

May 29

🚀 We're one step closer to creating super-intelligent AI biomedical scientists. Biomni is a general-purpose biomedical AI agent equipped with hundreds of specialized tools, software, and databases—designed to tackle the most complex questions in biology. What amazes me most? The depth of analysis and unexpected insights Biomni delivers with every query. It's not just processing data—it's genuinely advancing our understanding. The best part? Biomni is open-source. We know there's room to grow, and we're calling on the biomedical community to help us push these boundaries further. Your expertise doesn't just matter—it's essential to making this vision a reality. 🔗 Try it now: biomni.stanford.edu and Join our Slack community tinyurl.com/biomni-slack and share your insights! What's next: 1. CRISPR-GPT (gene-editing agent) - in press 2. Target discovery agent - coming soon 3. Reinforcement learning for biomedical tasks - coming soon Working with this incredible team to democratize AI-powered biomedical research has been one of the most rewarding experiences of my career, especially credit to @KexinHuang5 @serena2z @hcwww_ @mintaylu and our advisors @lecong @jure @MengdiWang10 @SnyderShot Aviv Regev What biomedical challenges would you want to see AI tackle next?

Biomni - A General-Purpose Biomedical AI Agent

A general-purpose biomedical AI agent to automate biomedical research.

biomni.stanford.edu

Kexin Huang

@KexinHuang5

May 29

Ryan Li · May 29, 2025 · 5:57 PM UTC

Ryan Li

@RyanLi0802

May 29

It's been a blast working on the project and watching Biomni take shape. Try it out today at biomni.stanford.edu/ and stay tuned for future updates!

Biomni - A General-Purpose Biomedical AI Agent

A general-purpose biomedical AI agent to automate biomedical research.

biomni.stanford.edu

Jure Leskovec

@jure

May 29

Kexin Huang · May 29, 2025 · 4:52 PM UTC

Ryan Li retweeted

Kexin Huang

@KexinHuang5

May 29

118

438

Jure Leskovec · May 29, 2025 · 4:46 PM UTC

Ryan Li retweeted

Jure Leskovec

@jure

May 29

372

Percy Liang · May 22, 2025 · 2:02 PM UTC

Ryan Li retweeted

Percy Liang

@percyliang

May 22

AI agents have the potential to significantly alter the cybersecurity landscape. To help us understand this change, we are excited to release BountyBench, the first framework to capture offensive & defensive cyber-capabilities in evolving real-world systems.

131

XLANG NLP Lab · Apr 24, 2025 · 4:51 AM UTC

Ryan Li retweeted

XLANG NLP Lab @XLangNLP

Apr 24

🎉 UI-TARS-1.5 is now live on Computer Agent Arena! Currently the SOTA model across multiple GUI benchmarks, showcasing leading performance in computer use, browser use, and even gameplay. Want to try the most intelligent CUA so far? Go to arena.xlang.ai.

Yujia Qin @TsingYoga

Apr 17

Introducing UI-TARS-1.5, a vision-language model that beats OpenAI Operator and Claude 3.7 on GUI Agent and Game Agent tasks. We've open-sourced a small-size version model for research purposes, more details can be found in our blog. TARS learns solely from a screen, but generalizes beyond a screen! Blog: seed-tars.com/1.5 Model: github.com/bytedance/UI-TARS App: github.com/bytedance/UI-TARS…

Ryan Li · Apr 8, 2025 · 7:42 PM UTC

Ryan Li

@RyanLi0802

Apr 8

CUA Arena is live after a full year of hard work! Huge shoutout @BowenWangNLP and @XLangNLP for doing such an amazing job. Try it out at: arena.xlang.ai/

Computer Agent Arena | XLANG Lab

The Computer Agent Arena is an open evaluation platform that enables real-time assessment of computer agents by real users with diverse backgrounds on open-ended real-world tasks.

arena.xlang.ai

Bowen Wang

@BowenWangNLP

Apr 8

🎮 Computer Use Agent Arena is LIVE! 🚀 🔥 Easiest way to test computer-use agents in the wild without any setup 🌟 Compare top VLMs: OpenAI Operator, Claude 3.7, Gemini 2.5 Pro, Qwen 2.5 vl and more 🕹️ Test agents on 100+ real apps & webs with one-click config 🔒 Safe & free access on cloud-hosted machines Page: arena.xlang.ai Leaderboard (tentative): arena.xlang.ai/leaderboard Blog: arena.xlang.ai/blog/computer… Data & Code (coming soon): github.com/xlang-ai/computer… ⭐️Why Computer Agent Arena? 1️⃣Beyond Static Benchmarks: We use computers to perform enormous tasks and workflows every day, and AI agents have the potential to automate these tasks. However, existing benchmarks are very limited (e.g., only 369 tasks in OSWorld and 812 tasks in WebArena). To better measure their capabilities, we introduce Computer Agent Arena for users to easily compare & test AI agents on all kinds of crowdsourced real-world computer use tasks. 2️⃣Cloud Testing, Simplified: As agents like OpenAI’s Operator and Claude 3.7 sonnet release, users face configuration challenges and privacy hurdles to deploy on their own computers. Our platform integrates these agents with cloud-hosted machines, providing users with quick and secure access. 3️⃣Unified Embodied Digital Environment: Unlike Chatbot Arena, we provide users with a real embodied environment—computers—where all agents are grounded in real computer tasks and environments. Led by @XLANG_Lab [1/🧵]

Kexin Huang · Feb 18, 2025 · 5:48 PM UTC

Ryan Li retweeted

Kexin Huang

@KexinHuang5

Feb 18

🧪 Introducing POPPER: an AI agent that automates hypothesis validation by sequentially designing and executing falsification experiments with statistical rigor. 🔥POPPER matched PhD-level scientists on complex bio hypothesis validation - while reducing time by 10-fold! 🧵👇

225

1,217