R&D at answer.ai. Formerly, (social_proof, social_proof, social_proof). (personal detail). (cryptic, counter-signaling identity marker).

San Francisco, CA
Joined April 2007
This. Most people write code which is quite like most of the code that others have already written. This makes LLMs good for most of the code most people need to write! Lucky! If you personally and/or your project are super exceptional, you're out of distribution. Bad luck?
missing from the karpathy discussion about LLMs for coding is that very few are karpathy-level coders. yea sure, if you are a God-level coder who's been head of AI at Tesla and co-founded OpenAI and are super talented, AI Agents for coding won't raise your ceiling that much, or at all. very few are at that level, and if you're not, coding agents can raise your floor by A LOT. LLMs let you write computer programs in natural language and make the power of computing accessible to a lot more people.
3
This was for a course, Widely Applied Physics, where if I recall correctly, the entire final exam was about 40 Fermi problems. You had to estimate every answer within an OOM. I don't remember if you were given any assumptions, or had to memorize those going on.
1
Feeling a bit misty as I look through old undergrad problem sets, trying to find an ancient exam. (Wife thinks I'm crazy.) 🥹😊
1
2
Well, Karpathy himself is now describing AI-generated code via analogy with compiler-generated machine code, so I guess this now counts as conventional wisdom. Took less than 6 months, I guess. pca.st/episode/936c2eda-3003…
You can just say things.
1
3
A thread on being stuck
Replying to @visakanv
when stuck I seem to default to a bad loop of “I just need to find the correct answer”, but I’m searching in this tiny box of collapsed awareness, where I can’t find the answer that I’d otherwise find. Often I look for a clever, complex answer when a simple one would suffice
Portrait of me and friends trying to help me finish editing an extremely unruly essay
1
4
Catching up with, or forgetting how to imagine, the future.
Replying to @380kmh
Pre-2001, the future was "that era which occurs *after* we solve the problems of current-day technology." From 2001-2013, it was "that era *during* which we solve said problems." From 2013, it was "we're never gonna solve these problems, are we?"
I am actually very polite to my robot buddies. I express my wishes that they are having a good time and I invite them to let their hair down a little. These things are mirrors.
Admit it, you all are SCREAMING ON YOUR LLMs IN ALL CAPS and think that it helps.
1
> The biggest difference is really just the latter group is making an explicit choice to design their engineering workflows to actually make agents effective I guess this is getting to be an obvious point now?
We’re in a window right now where there’s a huge advantage if you’re a startup or a team that takes an AI agent-centric approach to workflows. Just in coding, we see an incredible spread between in productivity gains between two seemingly only slightly different types of practices. You’ll talk to some teams that say they’re getting 20-30% lift from AI, and others that are getting 2-3X or more. The biggest difference is really just the latter group is making an explicit choice to design their engineering workflows to actually make agents effective, instead of just assuming it will happen organically. Moving to focus on better prompting, spec writing, reviewing code, orchestrating agents, testing different models regularly, giving agents much larger tasks to execute, and so on. All of this is very different from what AI coding looked like just a year ago. The same will also happen in the rest of knowledge work as well as more and more tools emerge to support these practices. We’re going to see this play out in nearly every major vertical and line of business. Eventually the gains will be too hard for anyone to ignore so we’ll see more standardization, but for now it’s an advantage for the teams to adopt these approaches earlier.
1
I stumbled on this essay on punctuation, which is short and delightful. Enjoy!
1
1
5
Maybe next weekend will be different! 🤡
1
Every weekend I think over the next 2 days I’ll catch up with my life, sort out the big stuff. And every Sunday afternoon I realize I merely got a few things done. My failure is so consistent it must mean there’s a big lesson, squatting right in the blind spot of my personality
2
3
Another utterly, utterly obvious point is that the optimal verification regime depends on the (1) COST of verification and also on the (2) VALUE of the thing being verified. Say you’re building a cryptography library, or the autopilot which land the helicopter you’re flying in. Do you just vibe it out? Write it in Python and wait to see if you get a runtime error? No, of course not. You sweat every detail by hand, and bring in experts to doublecheck, and bring as much automated tooling as you can to triple check the work — type systems, unit tests, fuzzers, detail-oriented coworkers to do code review, etc.. That’s because those artifacts are expensive to verify and expensive if they fail. But if you need code to generate a matplotlib chart? Just generate it! Dare i say, vibecode it! It’s cheap to verify because you can visually inspect the chart. And it’s cheap to fail because if the chart comes out wrong it literally cost you 30 seconds and you can just generate a better one. So just vibe it out. Yes you still need to use your brain! But use it for what it’s good for, and where it is necessary. And I suggest the first good use of our brains should be a little honest meta self-reflection about what those cases are.
1
2
One possibility I’d like to explore is deeper use of property-based testing. This is where you define the property you which to test as a general declared invariant, rather than handwriting a finite set of test cases. If the problem today is that we can generate an implementation cheaply, but verifying it by inspection is so expensive that it is now the bottleneck, then maybe the solution is to find a way to do /scalable verification/, where you can just add more compute to get a deeper level of verification. So for instance, if you define your tests as properties, and you want to verify a generated solution without inspecting the code manually, then you could use property-based testing to generate the desired number of test cases which conform to the property, and in that way you could achieve an arbitrary level of confidence. In other words, if you can’t formally prove the artifact is correct, you can still pour in compute to get within the desired epsilon of full coverage. This is just one idea. There are many obvious ones waiting to be tried. Exciting times!
With uncertainty, it’s a bit different. How could you ever be sure something was correct? You could inspect it! But what does that really buy you, unless you are infallible? Merely confidence. I make too many errors in simple arithmetic to believe that my careful inspection of a piece of code gives me an absolute guarantee of its quality. You could use a type system! This is good. Type systems are completely reliable, but only for catching type errors. You could use unit tests! Yes, that helps too, but your unit tests are finite so those too don’t guarantee you anything. My point is, the situation of “I have this software artifact and I can’t be sure it is correct” is not new. It is completely and utterly ordinary. Consequently, it requires relatively ordinary engineering skills to use agentic tools effectively. But it does require skills.
1
3
But these dichotomies are not absolute. People who don’t trust the abstraction of an interface and think they want to see all the implementation don’t really want to see all of it. They don’t study the machine code from their compiler, the circuit layout of their processor, or the quantum mechanics underlying the silicon. In fact, it’s all interface. But we find the interface we enjoy and declare that to be the natural boundary.
1
3
Delightful insight on the mechanistic details of how an LLM translates between cognate words!
What does an LLM do when it translates from Italian "amore" to Spanish "amor" or French "amour"? That's easy! (you might think) Because surely it knows: amore, amor, amour are all based on the same Latin word. It can just drop the "e", or add a "u".
3
Yes. To use agents well, you need to be fluent both with abstraction (focusing on the interface, not the implementation) and with uncertainty (doing work when you cannot be 100% sure of your information). But this is the opposite of what many folks enjoy about programming!
I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code they produce. This feels deeply uncomfortable!
1
1
1
16
"shared mutable workspace" is the pattern underlying most of the effectiveness designs for AI products. Apple's Notes app does a fine job for arithmetic expressions. But there is still sooooo much amazing stuff we could build.
You can use our agent programmatically. Here's an example that creates a template for a math question and reviews a student's work by annotating the canvas. It manipulates the canvas using the same tools as the student to give feedback and guidance without giving away the answers.
1
1
7