Tuan Tran retweeted
🚨🌶️ Did you realise you can get alignment `training’ data out of open weights models? Oops We show that models will regurgitate alignment data that is (semantically) memorised. This data can come from SFT and RL... and can be used to train your own models! 🧵
9
41
3
240
TL;DR: I made a Transformer that conditions its generation on latent variables. To do so an encoder Transformer only needs a source of randomness during generation, but then it needs an encoder for training, as a [conditional] VAE. 1/5
Tuan Tran retweeted
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
Tuan Tran retweeted
🧠Great research from @Meta Superintelligence Labs. Proposes Meta Agents Research Environments (ARE) for scaling up agent environments and evaluations. ARE lets researchers build realistic agent environments, run agents asynchronously, and verify them cleanly. On top of it they release Gaia2, a 1,120 scenario benchmark that stresses search, execution, ambiguity, time pressure, collaboration, and noise, and the results show sharp tradeoffs between raw reasoning and speed or cost. ⚙️ The Core Concepts ARE (Agent Runtime Environment) treats the world as a clocked simulation where everything is an event, the agent runs separately, and interactions flow through tools and notifications. Apps are the tools, environments bundle the apps plus rules, and scenarios package starting state, scheduled events, and a verifier. Traditional old benchmarks froze the world while a model was “thinking.” That made results look clean but ignored the real costs of inference time. In ARE, the world keeps ticking asynchronously. Time passes even while the model is generating, apps can trigger notifications, and other actors may act. So if a model is slow, it directly shows up as missed deadlines in the benchmark. That is exactly why GPT-5 (high) got 79.6 on Search but 0 on Time in default mode. The reasoning quality was excellent, but ARE exposed its inference slowness as a concrete failure mode. When ARE switched to instant mode, stripping out the latency, the model suddenly performed well — proving the bottleneck wasn’t reasoning but raw response time @AIatMeta 🧵 Read on 👇
2
11
27
Tuan Tran retweeted
Damn, very interesting paper. after rapid loss reduction, we see deceleration and follow "scaling law": this is because at these steps, gradients start to conflict each other. Updates are 'fightining for modal capacity' in some sense, and larger the model less fighting there are. and its quantifiably so.
There are two kinds of academics: those who try to ride the wave and those who try to see ahead. Avoid the former.
There at two kinds of VCs: those who try to ride the wave and those who try to see ahead. Avoid the former.
Tuan Tran retweeted
Our team's AI-Researcher has been accepted by NeurIPS 2025 and selected as a Spotlight! 🌟 The project has also garnered 2.4K stars on GitHub and made it to the GitHub Trending list. Congratulations to our core team members: Jiabin, Lianghao, and Zhonghang! 👏 Over the past six months, we've been driven by a vision: to create a true AI Co-Scientist that transforms AI research assistance from concept to practical, systematic product. Today, Novix is officially live and has received an enthusiastic welcome from the community. This achievement wouldn't have been possible without the invaluable support and feedback from our open-source community. Novix: novix.science/ Github: github.com/HKUDS/AI-Research… Looking back over the past two years, we've witnessed the remarkable evolution of LLMs from simple Chatbot interfaces to sophisticated intelligent agents. Whether it's daily coding tasks or deep academic research, AI has become an indispensable partner in our work. Now is the perfect time to let AI truly participate in the research process as a Co-Scientist. Let's embark together on the next chapter of AI-accelerated research! 🤗
So #AAAI2026 @realaaai (to be held in Singapore) has 29,000 initial submissions, with 20,000 from China alone. They have 23,000 papers in review (double that of 2025!) Unless we can clone synthetic #AI reviewers pronto, we are cooked.. 😱😱😱
Tuan Tran retweeted
The high cost required to train large foundation models makes them the fastest depreciating asset in human history.
The proprietary frontier models of today are ephemeral artifacts. Essentially very expensive sandcastles. Destined to be washed away by the rising tide of open source replication (first) and algorithmic disruption (later).
40
91
15
1,062
Tuan Tran retweeted
There is now a path for China to surpass the U.S. in AI. Even though the U.S. is still ahead, China has tremendous momentum with its vibrant open-weights model ecosystem and aggressive moves in semiconductor design and manufacturing. In the startup world, we know momentum matters: Even if a company is small today, a high rate of growth compounded for a few years quickly becomes an unstoppable force. This is why a small, scrappy team with high growth can threaten even behemoths. While both the U.S. and China are behemoths, China’s hypercompetitive business landscape and rapid diffusion of knowledge give it tremendous momentum. The White House’s AI Action Plan released last week, which explicitly champions open source (among other things), is a very positive step for the U.S., but by itself it won’t be sufficient to sustain the U.S. lead. Now, AI isn’t a single, monolithic technology, and different countries are ahead in different areas. For example, even before Generative AI, the U.S. had long been ahead in scaled cloud AI implementations, while China has long been ahead in surveillance technology. These translate to different advantages in economic growth as well as both soft and hard power. Even though nontechnical pundits talk about “the race to AGI” as if AGI were a discrete technology to be invented, the reality is that AI technology will progress continuously, and there is no single finish line. If a company or nation declares that it has achieved AGI, I expect that declaration to be less a technology milestone than a marketing milestone. A slight speed advantage in the Olympic 100m dash translates to a dramatic difference between winning a gold medal versus a silver medal. An advantage in AI prowess translates into a proportionate advantage in economic growth and national power; while the impact won’t be a binary one of either winning or losing everything, these advantages nonetheless matter. Looking at Artificial Analysis and LMArena leaderboards, the top proprietary models were developed in the U.S., but the top open models come from China. Google’s Gemini 2.5 Pro, OpenAI’s o4, Anthropic’s Claude 4 Opus, and Grok 4 are all strong models. But open alternatives from China such as DeepSeek R1-0528, Kimi K2 (designed for agentic reasoning), Qwen3 variations (including Qwen3-Coder, which is strong at coding) and Zhipu’s GLM 4.5 (whose post-training software was released as open source) are close behind, and many are ahead of Meta’s Llama 4 and Google’s Gemma 3 — the U.S.’ best open-weights offerings. Because many U.S. companies have taken a secretive approach to developing foundation models — a reasonable business strategy — the leading companies spend huge numbers of dollars to recruit key team members from each other who might know the “secret sauce“ that enabled a competitor to develop certain capabilities. So knowledge does circulate, but at high cost and slowly. In contrast, in China’s open AI ecosystem, many advanced foundation model companies undercut each other on pricing, make bold PR announcements, and poach each others’ employees and customers. This Darwinian life-or-death struggle will lead to the demise of many of the existing players, but the intense competition breeds strong companies. In semiconductors, too, China is making progress. Huawei’s CloudMatrix 384 aims to compete with Nvidia’s GB200 high-performance computing system. While China has struggled to develop GPUs with a similar capability as Nvidia’s top-of-the-line B200, Huawei is trying to build a competitive system by combining a larger number (384 instead of 72) of lower-capability chips. China’s automotive sector once struggled to compete with U.S. and European internal combustion engine vehicles, but leapfrogged ahead by betting on electric vehicles. It remains to be seen how effective Huawei’s alternative architectures prove to be, but the U.S. export restrictions have given Huawei and other Chinese businesses a strong incentive to invest heavily in developing their own technology. Further, if China were to develop its domestic semiconductor manufacturing capabilities while the U.S. remained reliant on TSMC in Taiwan, then the U.S.’ AI roadmap would be much more vulnerable to a disruption of the Taiwan supply chain (perhaps due to a blockade or, worse, a hot war). With the rise of electricity, the internet, and other general-purpose technologies, there was room for many nations to benefit, and the benefit to one nation hasn’t come at the expense of another. I know of businesses that, many months back, planned for a future in which China dominates open models (indeed, we are there at this moment, although the future depends on our actions). Given the transformative impact of AI, I hope all nations — especially democracies with a strong respect for human rights and the rule of law — will clear roadblocks from AI progress and invest in open science and technology to increase the odds that this technology will support democracy and benefit the greatest possible number of people. [Full text: deeplearning.ai/the-batch/is… ]
Tuan Tran retweeted
There are two kinds of academics: those who complicate the simple and those who simplify the complex.
Tuan Tran retweeted
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can't wait to see what you build! 🔌 API is here: platform.moonshot.ai - $0.15 / million input tokens (cache hit) - $0.60 / million input tokens (cache miss) - $2.50 / million output tokens 🔗 Tech blog: moonshotai.github.io/Kimi-K2… 🔗 Weights & code: huggingface.co/moonshotai 🔗 Github: github.com/MoonshotAI/Kimi-K… Try it now at Kimi.ai or via API!
Tuan Tran retweeted
"in 2025 we will have flying cars" 😂😂😂
Tuan Tran retweeted
“How will my model behave if I change the training data?” Recent(-ish) work w/ @logan_engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).
Tuan Tran retweeted
These guys literally burned the transformer architecture into their silicon. 🤯 And built the fastest chip of the world of all time for transformers architecture. 500,000 tokens per second with Llama 70B throughput. 🤯 World’s first specialized chip (ASIC) for transformers: Sohu One 8xSohu server replaces 160 H100 GPUs. And raised $120mn to build it. 🚀 The Big Bet @Etched froze the transformer recipe into silicon. By burning the transformer architecture into its chip means it can’t run many traditional AI models: like CNNs, RNNs, or LSTMs. also it can not run the DLRMs powering Instagram ads, protein-folding models like AlphaFold 2, or older image models like Stable Diffusion 2. But for transformers, Sohu lets you build products impossible on GPUs. HOW ❓❓ Because Sohu can only run one algorithm, the vast majority of control flow logic can be removed, allowing it to have many more math blocks. As a result, Sohu boasts over 90% FLOPS utilization (compared to ~30% on a GPU7 with TRT-LLM).
why does this guy have a point 😭 💀
Tuan Tran retweeted
This is a huge win for AI industry. Humans learn by reading books. We are judged on our outputs, not on our inputs. Same for AI.
🚨BREAKING: A LANDMARK JUDGEMENT FOR THE AI INDUSTRY. US Federal Judge ruled Anthropic may train its AI on published books without authors’ permission. This is the first court endorsement of fair use protecting AI firms when they use copyrighted texts to train LLMs. AI may study what it buys, not what it grabs from pirate sites. --------- "First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for training or learning as such. Everyone reads texts, too, then writes new texts. They may need to pay for getting their hands on a text in the first instance. But to make anyone pay specifically for the use of a book each time they read it, each time they recall it from memory, each time they later draw upon it when writing new things in new ways would be unthinkable. For centuries, we have read and re-read books. We have admired, memorized, and internalized their sweeping themes, their substantive points, and their stylistic solutions to recurring writing problems." The court file is such an interesting read. 🧵 Read on 👇
Tuan Tran retweeted
There's a lot of work now on LLM watermarking. But can we extend this to transformers trained for autoregressive image generation? Yes, but it's not straightforward 🧵(1/10)
I'm surprised this post didn't get more attention. Most papers arguing we are far from AGI just show LLMs failing on tasks too big for their context window. Sergey identifies something more fundamental: while LLMs learned to mimic intelligence from internet data, they never had to actually live and acquire that intelligence directly. They lack the core algorithm for learning from experience. They need a human to do that work for them. This suggests, at least to me, that the gap between LLMs and genuine intelligence might be wider than we think. Despite all the talk about AGI either being already here or coming next year, I can't shake off the feeling it's not possible until we come up with something better than a language model mimicking our own idea of how an AI should look like.
I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/…
Tuan Tran retweeted
Levels of difficulty in "distributed" AI training: 1. Distributed: large cluster of GPUs in one datacenter 2. Decentralized: multiple clusters spread across different datacenters with weak links in between them, data can be moved 3. Federated: all of the above, but data can't be moved At @flwrlabs, we're building the best framework for 3 because 2 and 1 are special cases of 3.
2
5