💎PiFlow, the first "Scientist Brain" of Agentic AI Scientist.

Zurich
Joined January 2019
We find that PiFlow is exceptionally robust in advancing scientific discovery by optimizing scientific principles—it is intuitive to use, efficient to learn, and highly effective. Moreover, principles discovered later exhibit a better balance between exploration and exploitation.
1
1
Mellen Y. Pu retweeted
This survey shows multimodal models can self-improve with less human work. The loop has 3 parts, collection, organization, and optimization. Collection uses random sampling, guided prompts, and hard negatives. Organization verifies outputs with rules, peer judges, or environment feedback. Optimization trains by supervised fine-tuning, reinforcement learning, or preference optimization. A seed model starts the loop, and each round strengthens it. The survey defines 6 autonomy levels from human guided to self-run. Verifiable rewards tend to lift reasoning on tasks. Preference or AI feedback tends to cut hallucinations and errors. ---- Paper – arxiv. org/abs/2510.02665 Paper Title: "Self-Improvement in Multimodal LLMs: A Survey"
6
16
1
100
Mellen Y. Pu retweeted
🌀New Test-time scaling method 🌀 📝: arxiv.org/abs/2509.06870 - Use RL to train an LLM solution aggregator – Reasons, reviews, reconciles, and synthesizes a final solution -> Much better than existing techniques! - Simple new method. Strong results across 4 math benchmarks. 🧵1/5
Mellen Y. Pu retweeted
这是是到目前为止我看到过的最喜欢,最有生命力的 AIGC 生成视频~
Mellen Y. Pu retweeted
这个 Youtube 博主的内容真的好的令人难以置信... 😂
42
286
5
1,989
Mellen Y. Pu retweeted
Pretty cool that they open sourced the actual full-sized production model. Here’s the Grok 2.5 architecture overview next to a roughly similarly sized Qwen3 model. The MoE residual is quite interesting. Kind of like a shared expert. I don't think I've seen this setup before.
The @xAI Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-…
40
203
7
1,669
Mellen Y. Pu retweeted
Hunyuan 3D-2.1 turns any flat image into studio-quality 3D models. And you can do it on this @huggingface space for free.
Mellen Y. Pu retweeted
What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog 🧵
The gpt-oss models from OpenAI are a synthesis of ideas from prior research. Here are 10 interesting papers that were directly used in gpt-oss… (1) Longformer: Introduces sliding window attention, a form of sparse attention that is utilized in alternating layers of both gpt-oss models. (2) StreamingLLM: Describes the concept of attention sinks in large language models (LLMs)—these are tokens within a sequence that the model assigns high attention or weight to, simply because the softmax operation prevents the model from assigning attention to no tokens at all. (3) Off-by-one attention: Proposes a solution to attention sinks by allowing the attention mechanism to assign no attention to any token. This is achieved by adding a bias term of 1 to the denominator of the softmax operation within attention. In gpt-oss models, a similar approach is used, but the bias term is learned rather than fixed at 1. (4) Switch Transformer: Presents several ideas foundational to modern mixture-of-experts (MoE) based LLMs. It’s important to note that many other papers, in addition to Switch Transformer, have contributed to this field. (5) RMSNorm: A streamlined variant of layer normalization that is both more efficient and has fewer trainable parameters. Both gpt-oss models employ RMSNorm. (6) RoPE: Stands for Rotary Positional Encoding, a hybrid absolute/relative positional encoding method used by gpt-oss models. RoPE encodes absolute position using a rotation matrix and incorporates relative position information directly into the self-attention mechanism. (7) YaRN: A method for extending the context window in LLMs, which is adopted by gpt-oss models. YaRN works by adjusting the frequency basis used within RoPE and further training the LLM to handle longer contexts. (8) Flash Attention: Utilized by gpt-oss models, flash attention leverages system-level optimizations to significantly improve the computational and memory efficiency of the attention operation. (9) DeepSeek-R1: While the specific reasoning or reinforcement learning (RL) training strategies used by gpt-oss models are not fully detailed, the DeepSeek-R1 technical report offers a comprehensive overview of how RL training with verifiable rewards is implemented at scale. (10) Deliberative alignment: This is the safety training approach used by gpt-oss models, designed to teach the models how to reason through safety specifications and determine when it is appropriate to refuse a request.
7
90
3
420
Mellen Y. Pu retweeted
🚗Nullspace MPC🚙 A novel multi-objective control framework explicitly handling task priorities — demonstrated here on a swerve drive robot navigating autonomously through tight spaces. Also includes MPPI as a baseline in the open source project. github.com/MizuhoAOKI/nullsp…
Mellen Y. Pu retweeted
Meet Higgsfield Product-to-Video! Drop your product straight into the pic. Or start blank & build your MOST SELLING frame from 0. This is POWERFUL: Perfect Product Placement with 0 prompts & ALL our models. Retweet = full P2V Playbook in your DM.
Mellen Y. Pu retweeted
苹果开源了一款用于大规模嵌入向量的交互式可视化工具:Embedding Atlas,可对嵌入向量及其元数据可视化、交叉筛选及搜索 可以缩放、旋转、拖动视图等等,从不同的角度观察数据,支持实时搜索与最近邻查找 可以自动把相似的数据分组,并给每组数据贴上标签,自动分类 可以显示密度,显示哪些区域的点比较密集,哪些区域比较稀疏 具备高性能渲染能力,能处理大规模数据,支持多视图联动 #EmbeddingAtlas #向量数据
这些发现表明,LLM不是有原则的推理者,而是类似推理文本的复杂模拟器。
2
7
26
Mellen Y. Pu retweeted
This is a roadmap to the future of AI. Hear directly from engineers, researchers, and product leads at Google, NVIDIA, GitHub, Uber, Riot Games, Pfizer, and more. This is where the right conversations happen.
Mellen Y. Pu retweeted
阿里这次难得一见的大方啊 直接给 Qwen Code 提供每天两千次的免费请求,开始学谷歌了
💡 You get 2,000 free Qwen Code runs every day! Run this one simple command: npx @​qwen-code/qwen-code@latest Hit Enter, and that’s it! 🚀 Now with Qwen OAuth support — super easy to use. Try it now and supercharge your vibe code! 💻⚡ Github:github.com/QwenLM/qwen-code
13
25
242
Mellen Y. Pu retweeted
截至今天,只有o3和gpt-5 thinking 可以答对这道理:女儿考了38分。 gemini-2.5 pro、claude 4 opus都不行。