must read. 🤗
After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you navigate the messy training reality that LLM papers don't cover. Chapter highlights in the 🧵
Vladimir Albrekht retweeted
After ~4 years building SOTA models & datasets, we're sharing everything we learned in ⚡The Smol Training Playbook We cover the full LLM cycle: designing ablations, choosing an architecture, curating data, post-training, and building solid infrastructure. We'll help you navigate the messy training reality that LLM papers don't cover. Chapter highlights in the 🧵
35
158
18
1,003
Claude Opus 4 > generate an artifact to represent yourself
Akame from Akame ga kill by Sonnet 3.7 with thinking mode. 2 turns
1
Why? 🤔 It's a warmup end at 2k steps but still why so huge drop?
SkyReels. image2video
1
1
0
Thanks for the great content Karpathy sensei.
New 3h31m video on YouTube: "Deep Dive into LLMs like ChatGPT" This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models of how to think about their "psychology", and how to get the best use them in practical applications. We cover all the major stages: 1. pretraining: data, tokenization, Transformer neural network I/O and internals, inference, GPT-2 training example, Llama 3.1 base inference examples 2. supervised finetuning: conversations data, "LLM Psychology": hallucinations, tool use, knowledge/working memory, knowledge of self, models need tokens to think, spelling, jagged intelligence 3. reinforcement learning: practice makes perfect, DeepSeek-R1, AlphaGo, RLHF. I designed this video for the "general audience" track of my videos, which I believe are accessible to most people, even without technical background. It should give you an intuitive understanding of the full training pipeline of LLMs like ChatGPT, with many examples along the way, and maybe some ways of thinking around current capabilities, where we are, and what's coming. (Also, I have one "Intro to LLMs" video already from ~year ago, but that is just a re-recording of a random talk, so I wanted to loop around and do a lot more comprehensive version of this topic. They can still be combined, as the talk goes a lot deeper into other topics, e.g. LLM OS and LLM Security) Hope it's fun & useful! piped.video/watch?v=7xTGNNLP…
New Lumina-Image 2.0 looks good at instruction following for only 2B model it's impressive. (Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transformer capable of generating images from text descriptions.) Prompt in alt
More compute = time traveling. So that's how the world works
flux_dev Lora trained on the Akame from (Akame ga kill)
1
1
Vladimir Albrekht retweeted
The whale strikes yet again! DeepSeek v3 Base 🔥 > 685 B (MoE) params > fp8 > Instruct beats Claude 3.5 Sonnet in Aider bench I hope they release the Instruct version too ;)
Looks really interesting to visit with a unique style. FLUX Style Shaping. In comment 2 source images.
Open source model for the "Computer use". - model only 2B based on QwenVL 2B. - new possible variation of VLM usage. I tested on a few examples just random pictures, model grounded correctly. Authors claim that their method of training the model is 1.4x faster than regular VLM 2B training.
Vladimir Albrekht retweeted
🔥We're thrilled to announce: ShowUI Local Run!🔥 🧑‍💻Now, you can use our 2B vision-language-action model for Local Computer control! 💰30x Cheaper than Claude! 🔗Model: github.com/showlab/ShowUI 🔗Computer Use OOTB: github.com/showlab/computer_… #ComputerUse #Agent #Claude