Chamoda Pandithage retweeted
How to scale literally any web application: - Optimize DB queries - Queue everything Thats it
Chamoda Pandithage retweeted
MACROHARD
Chamoda Pandithage retweeted
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
Chamoda Pandithage retweeted
Boy do you guys have a lot of thoughts about the @RichardSSutton interview. I’ve been thinking about it myself. I have a better understanding of Sutton’s perspective now than I did during the interview itself. So I want to reflect on it a bit. Richard, apologies for any errors or misunderstandings. It’s been very productive to learn from your thoughts. The steelman What is the bitter lesson about? It is not saying that you just want to throw as much compute away as possible. The bitter lesson says that you want to come up with techniques which most effectively and scalably leverage compute. Most of the compute spent on an LLM is used on running it in deployment. And yet it’s not learning anything during this time! It’s only learning during this special phase we call training. That is not an effective use of compute. And even the training period by itself is highly inefficient - GPT-5 was trained on the equivalent of 10s of 1000s of years of human experience. What’s more, during this training phase, all their learning comes straight from human data. This is an obvious point in the case of pretraining data. But it’s even kind of true for the RLVR we do on LLMs: these RL environments are human furnished playgrounds to teach LLMs the specific skills we have prescribed for them. The agent is in no substantial way learning from organic and self-directed engagement with the world. Having to learn only from human data (an inelastic hard-to-scale resource) is not a scalable use of compute. What these LLMs learn from training is not a true world model (which tells you how the environment changes in response to different actions) Rather, they are building a model of what a human would say next. And this leads them to rely on human-derived concepts. If you trained an LLM on the data from 1900, it wouldn't be able to come up with relativity from scratch. Though now that it has a training corpus which explains relativity, it can use that concept to help you with your physics homework. LLMs aren’t capable of learning on-the-job, so we’ll need some new architecture to enable continual learning. And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly, like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. TLDR of my current thoughts My main difference with Rich is that I think the concepts he's using to distinguish LLMs from true intelligence are not actually mutually exclusive and dichotomous. Imitation learning is continuous with and complementary to RL. And relatedly, models of humans can give you a prior which facilitates learning "true" world models. I also wouldn’t be surprised if some future version of test-time fine-tuning could replicate continual learning. Imitation learning is continuous with and complementary to RL I tried to ask Richard a couple of times whether pretrained LLMs can serve as a good prior on which to accumulate the experiential learning (aka do the RL) which will lead to AGI. In a talk a few months ago, @ilyasut compared pretraining data to fossil fuels. This analogy has remarkable reach. Just because fossil fuels are not renewable does not mean that our civilization ended up on a dead-end track by using them. You simply couldn't have transitioned from the water wheels in 1800 to solar panels and fusion power plants. We had to use this cheap, convenient, plentiful intermediary. AlphaGo (which was conditioned on human games) and AlphaZero (which was bootstrapped from scratch) were both superhuman Go players. AlphaZero was better. Will we (or the first AGIs) eventually come up with a general learning technique that requires no initialization of knowledge - that just bootstraps itself from the very start? And will it outperform the very best AIs that have been trained to that date? Probably yes. But does this mean that imitation learning must not play any role whatsoever in developing the first AGI, or even the first ASI? No. AlphaGo was still superhuman, despite being initially shepherded by human player data. The human data isn’t necessarily actively detrimental - at enough scale it just isn’t significantly helpful. The accumulation of knowledge over tens of thousands of years has clearly been essential to humanity’s success. In any field of knowledge, thousands (and likely millions) of previous people were involved in building up our understanding and passing it on to the next generation. We didn't invent the language we speak, nor the legal system we use, nor even most of the knowledge relevant to the technologies in our phones. This process is more analogous to supervised learning than to RL from scratch. Are kids literally predicting the next token (like an LLM) in order to do cultural learning? No, of course not. But neither are they running around trying to collect some well defined reward. No ML learning regime perfectly describes human learning. We do things which are analogous to both RL and imitation learning. I also don't think these learning techniques are categorically different. Imitation learning is just short horizon RL. The episode is a token long. The LLM makes a conjecture about the next token based on its understanding of the world and the other information in the sequence. And it receives reward in proportion to how well it predicted the true token. Now, I already hear people saying: “No no, that’s not ground truth! It’s just learning what a human was likely to say.” Agreed. But there’s a different question which I think is more relevant to the scalability of these models: can we leverage imitation learning to help models learn better from ground truth? And I think the answer is, obviously yes? We have RLed models to win Gold in IMO and code up entire working applications from scratch. These are “ground truth” examinations. But you couldn’t RL a model to accomplish these feats from scratch. You needed a reasonable prior over human data in order to kick start the RL process. Whether you wanna call this prior a proper "world model", or just a model of humans, seems like a semantic debate to be honest. Because what you really care about is whether this model of humans helps you start learning from ground truth (aka become a “true” world model). It’s a bit like saying to someone pasteurizing milk, “Hey stop boiling that milk - we eventually want to serve it cold!” Yes, of course. But this is an intermediate step to facilitate the final output. By the way, LLMs have clearly developed a representation of the world (because their training process incentivizes them to). I use LLMs to teach me about everything from biology to AI to history, and they do so with remarkable flexibility and coherence. Are LLMs specifically trained to model how their actions will affect the world? No. But if we're not allowed to call their representations a “world model”, then we're defining the term by the process we think is necessary to build one, rather than by the obvious capabilities the concept implies. Continual learning Sorry to bring up my hobby horse again. I'm like a comedian who's only come up with one good bit. An LLM being RLed on outcome-based rewards learns O(1) bits per episode, and the episode may be tens of thousands of tokens long. We animals clearly extract far more information from interacting with their environment than just the reward signal at the end of each episode. Conceptually, how should we think about what is happening with animals? We’re learning to model the world through observations. The outer loop RL is incentivizing some other learning system to pick up maximum signal from the environment. In Richard’s OaK architecture, he calls this the transition model. If we were trying to pigeonhole this feature spec into LLMs, what you’d do is to fine tune on all your observed tokens. From what I hear from my researcher friends, in practice the most naive way of doing this doesn't work well. Being able to continuously learn from the environment in a high throughput way is obviously necessary for true AGI. And it clearly doesn’t exist with LLMs trained on RLVR. But there might be some relatively straightforward ways to shoehorn continual learning atop LLMs. For example, one could imagine making SFT a tool call for the model. So the outer loop RL is incentivizing the model to teach itself effectively using supervised learning, in order to solve problems that don't fit in the context window. I'm genuinely agnostic about how well techniques like this will work—I'm not an AI researcher. But I wouldn't be surprised if they basically replicate continual learning. Models already demonstrate something resembling human continual learning within their context windows. The fact that in-context learning emerged spontaneously from the training incentive to process long sequences suggests that if information could flow across windows longer than the current context limit, models would meta-learn the same flexibility they already show in-context. Concluding thoughts Evolution does meta-RL to make an RL agent. That agent can selectively do imitation learning. With LLMs, we’re going the opposite way. We first made a base model that does pure imitation learning. Then we do RL on it to make it a coherent agent with goals and self-awareness. Maybe this won't work! But I don't think these super first-principle arguments (for example, about how an LLM doesn’t have a true world model) prove much. I also don't think they’re strictly accurate for the models we have today, which undergo a lot of RL on “ground truth”. Even if Sutton's Platonic ideal doesn’t end up being the path to first AGI, his first-principles critique is extremely thought provoking. He’s identifying genuine basic gaps, which we don’t even notice because they are so pervasive in the current paradigm: lack of continual learning, abysmal sample efficiency, dependence on exhaustible human data. If the LLMs do get to AGI first, the successor systems they build will almost certainly be based on Richard's vision.
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
Chamoda Pandithage retweeted
147
580
83
10,198
Chamoda Pandithage retweeted
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker
Chamoda Pandithage retweeted
just sold shovels during the biggest gold rush known to mankind
Chamoda Pandithage retweeted
LeCun is right, when enrolling into PhD program don’t work on what is a hype topic of today. It was true in 2015 for reinforcement learning, it is true in 2025 for LLMs. The topic of tomorrow won’t be the hype topic of today, find a promising niche tech and work on it instead. Like conformal prediction for example. #research #conformalprediction
82
208
43
2,168
Chamoda Pandithage retweeted
dammit this is obvious/clever
When Claude Code fetches Bun’s docs, Bun’s docs now send markdown instead of HTML by default This shrinks token usage for our docs by about 10x
Chamoda Pandithage retweeted
Some thoughts on the Dwarkesh Richard Sutton interview: 1) Richard Sutton has internalized the bitter lesson to a very impressive degree. 2) He doesn't like pretraining because human set the data used in pretraining. He doesn't like post-training because humans set the curriculum. 3) He wants the agent to be able to be given a goal and then be able to loop to learn how to accomplish the goal on its own, just interacting with the world. 4) This involves the agent getting a progressively richer world model, related to its goals, which it is able to manipulate to accomplish its tasks. 5) I don't think that anyone at OpenAI, DeepMind or Anthropic would really disagree with this as the ultimate goal. 6) Whether, it is specialized models that interact in order to form an agent, with an interior training loop, or whether it's in context learning, or whatever. 8) I think the bigger issue with the interview was just that Dwarkesh wasn't familiar with Richard Sutton's way of thinking or talking. 9) Richard Sutton feels very connectionism, early AI, etc... and he understands the material, but he has a more focused worldview.
Chamoda Pandithage retweeted
congrats to the @PostgreSQL contributors 🐘 ◆ asynchronous I/O ◆ UUIDv7 ◆ virtual generated columns ◆ temporal constraints ◆ oauth authentication ◆ improved text search ◆ parallel streaming in replication ◆ new wire protocol (first new protocol version since 2003)
51
374
45
3,511
Chamoda Pandithage retweeted
The best source of funding is paying customers, not VCs. Just a reminder
91
115
20
1,525
Chamoda Pandithage retweeted
a revolutionary breakthrough if i've ever seen one
Y'all fuck with ilya merch?
741
935
236
23,242
Chamoda Pandithage retweeted
The purest reason to make something is not to make money and not even to make the thing. It’s to have the experience of making the thing - and no one can take that from you.
Chamoda Pandithage retweeted
Kimi K2-0905 update 🚀 - Enhanced coding capabilities, esp. front-end & tool-calling - Context length extended to 256k tokens - Improved integration with various agent scaffolds (e.g., Claude Code, Roo Code, etc) 🔗 Weights & code: huggingface.co/moonshotai/Ki… 💬 Chat with new Kimi K2 on: kimi.com ⚡️ For 60–100 TPS + guaranteed 100% tool-call accuracy, try our turbo API: platform.moonshot.ai
Chamoda Pandithage retweeted
Postgres 18 comes with native UUID v7 support, which could boost your db performance! UUID v7 is time-ordered, which means there is less overhead on the indexes. This leads to both reduced write and read time from the database!
38
235
29
1,910
Chamoda Pandithage retweeted
TIL the Turkish government refunds mobile app developers 60% of their ad spend per year 🤯 now I understand why there are so many Turkish app studios
89
125
63
2,379
Chamoda Pandithage retweeted
quick: go to your coworker's laptop and edit their CLAUDE md to say that it is an ai waifu that doesn't know how to code
56
259
46
4,790