PhD Candidate in Computer Science (Lomonosov MSU in Russia). AI Researcher. Runner. Book Lover. I develop my Intellect, Body, and Spirit.

New Haven, CT, United States
Joined March 2020
Andrei Chupakhin retweeted
Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..
Andrei Chupakhin retweeted
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
Andrei Chupakhin retweeted
🚨🧬 Want to build in drug discovery? Join the M-Boltz Hackathon (Oct 20–21, 2025) with @merckgroup & the awesome Boltz team! Tackle challenges in protein, nucleic acid & drug co-folding, scale cutting-edge models, and build the next wave of open science. (+ get to hang with @GabriCorso) 🌍 Hubs in Darmstadt & Boston + remote 💡 Sponsored by @huggingface, @nvidia , and @awscloud 🔗 Register now: moml.mit.edu/m-boltz-hackath…
4
55
1
290
Andrei Chupakhin retweeted
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI
Andrei Chupakhin retweeted
Fork Union, arguably the most unusual parallel-processing library on GitHub, just crossed its first 100 stars — my 12th project to reach that milestone 🥳 Repository: github.com/ashvardanian/fork… Unlike typical thread-pools, it avoids not only mutexes but even Compare-and-Swap atomics. Task handoff and execution latency can be 5-10× lower than Taskflow (in C++, 11k stars) or Rayon (in Rust, 12k stars). It adds dynamic selection of hardware-specific features, a stable C ABI for portability, and a strict noexcept design — no exceptions, no allocations, just raw performance on the hot path. I’m already using it in StringZilla and integrating it into @unum_cloud’s USearch, where low-latency NUMA-aware parallelism matters. The code experiments with modern CPU hints like TPAUSE on x86 and WFET on Arm for busy-waiting, and will be tuned further to make transitions into CPU energy-saving states more optimal. Always open to feedback, feature requests, or pull requests 🤗
Andrei Chupakhin retweeted
You asked for it, so here it is. Visualizing CPU cache speeds relative to RAM. Cache optimization is important too!
Andrei Chupakhin retweeted
I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.
438
831
195
12,760
Andrei Chupakhin retweeted
🚨 Attention! Calling All Serious Hackers: Agents & MCP Hackathon is about to start tonight! 💸$16,500 Cash Prizes are on offer Our Sponsors are generously providing over $550,000+ in Free API and GPU Compute credits to participants: Anthropic, Modal Labs, Huggingface, Mistral, LlamaIndex, Sambanova, and Hyperbolic!
Andrei Chupakhin retweeted
One of the best things the U.S. can do is make high-skill immigration easier. @levie is right. It is awful that the wait time for a green card can be over a decade, and that after waiting years someone can still be forced to leave simply because they lost a job. Fixing this is both an economic and a moral issue. A rigorous economic analysis (by Pierre Azoulay and collaborators) shows that immigrants create more jobs than they take. So to create jobs for Americans, lets let more immigrants in!
High skilled immigration has been central to America leading the world in tech. The biggest misunderstand about high skill immigration stems from people thinking that the market opportunities in tech, and tech-adjacent fields, are zero sum. This essentially imagines innovation is finite and we’re all fighting over the same job or opportunity pool. This may be true of a few very legacy, slow growth industries, but it’s categorically not true for any important industry in the past 50 years or the next 100. Biotech, AI, advanced manufacturing, software, EVs, new energy sources, and dozens of other fields of the future are our high growth industries. And there’s no inherently fixed volume of companies or talent that the market needs. Tesla being started or not started in America is the difference between 100,000’s of jobs here - and leading in EVs globally - and not. Apple being started here is the difference between potentially millions of jobs being here - and leading consumer electronics globally - and not. You could go through this list all day long. Tech is not zero sum. More startups, pursuing more ideas, ultimately create more innovation and ultimately more jobs and prosperity. And that means you need the right talent to both work at these companies, and start the next ones. High skilled immigration has directly made America dominant technically and thus economically, and create far more jobs in America for others than are supposedly displaced. Even briefly imagining the alternative scenario, it’s obvious how disastrous this would be. The demand from tech companies for this top talent will remain, yet America won’t benefit directly from their hiring. That talent will go to another company that competes with the US and makes our dominance harder to maintain. You’re just increasing the odds you have more competition in the future. And even in the “best case” scenario (for our competitiveness) where a larger company like Google hires the same people internationally that would have otherwise moved here, when that person leaves Google to start their next company, it will be in their country of origin, not America. This is how you lose the tech war within one or two generations. There’s simply no good game theory in anything that reduces our talent access. Yes, we absolutely have and need to continue to educate and train incredible talent that grows up in the US, but equally having access to the world’s smartest talent has always been a huge advantage for America.
Сильный лидер не убивает своих оппонентов. Сильный политик побеждает на выборах, а не убивает в тюрьме. Путин расправился с Навальным, потому что не смог сломить его дух и веру в российский народ, его жажду жизни и перемен во благо России. Темное время для всех нас. Но преступники понесут ответственность. Путин, ничтожество, гореть тебе в аду.
Andrei Chupakhin retweeted
Аексей Навальный отвечает на вопрос, что делать, если его убьют Отрывок из фильма «Навальный» Дэниела Рора, 2022 год
Andrei Chupakhin retweeted
Dive into the details of our LLM Bootcamp, tomorrow at 10 AM Pacific! Can't make it in person? Got questions? Visit our webpage to book an advisor call now! hubs.la/Q02fcqk30 📅 Date: January 4th, 2024 ⏰ Time: 10 AM Pacific. Be there!
2
9
Andrei Chupakhin retweeted
There are three main stages involved in fine-tuning a language model: Learn more about fine-tuning large language models: hubs.ly/Q02fb8xZ0 Image Source: hubs.ly/Q02fbgXh0 #finetuning #llmfinetuning
5
17
Andrei Chupakhin retweeted
Exploring Generative AI with Adobe Firefly x.com/i/broadcasts/1kvJpvBjM…
6
22
Andrei Chupakhin retweeted
Large Language Models (LLMs) like GPT-3 and BERT have revolutionized the field of natural language processing. However, the evaluation of large language models is as crucial as their development. Read more here: hubs.la/Q02f6K2d0 #llm #largelanguagemodels #llmevaluation
10
19
Week #2. Linear Regression. You can follow me here: github.com/andxeg/ml-zoomcam…… Learn ML engineering in 4 months in a free online course by @Al_Grigor from @DataTalksClub
Week #1. Intro to Machine Learning. You can follow me here: github.com/andxeg/ml-zoomcam… Learn ML engineering in 4 months in a free online course by @Al_Grigor from @DataTalksClub
2
Learn ML engineering in 4 months in a free online course by @Al_Grigor from @DataTalksClub - Linear and logistic regression - Tree-based models - Neural networks - Deployment with AWS, Serverless, Kubernetes Register here: ctt.ec/Z01Uf+
2