Philosopher interested in cognitive science

Tbilisi, Georgia
Joined December 2010
it's crazy how Gemini 2.5 Pro is still undefeated. been what, 6 months already?
🚨 Leaderboard shakeup in the top slot! Claude Sonnet 4.5 now tied for #1 in the Text Arena, matching Claude Opus 4.1! 🏆 Quick reminder: the Arena rankings are powered by tens of thousands of real human votes which have put @AnthropicAI's Claude Sonnet 4.5 joins the very top tier of models like Gemini 2.5 Pro, Claude Opus 4.1, and GPT-5. Sonnet packs a punch 🥊 across top categories, more details in thread 🧵
If you're looking for something to read about AI, Herbert Simon's The Sciences of the Artificial might interest you. Here's what Miller had to say about it:
If you could recommend only one book on the Industrial Revolution, which one would you recommend? and why?
I’d love an AI tool I could invite into my email conversations as its own participant, someone who joins the discussion. Is there anything like that?
I stand with Ethan here. Even though there is clearly a hype, this hype mostly obscures the real implications of this technology and postpones adaptation. In this sense, paradoxically, it is also underhyped.
I don’t think most folks get the consequences of current levels of AI for how we work, learn, and interact. I think many people like to think this is all hype and will disappear. It won’t. That robs them of the will to shape how AI will be used & the policies that govern AI use.
I'm not sure how I feel about this meme. On the one hand, I get the critique of using current technology as a universal metaphor. But on the other hand, the idea of computation goes beyond our current computers. it runs much deeper.
Replying to @soychotic
Posting this again
Shota Azikuri retweeted
5
32
6
225
Shota Azikuri retweeted
Good post from @balajis on the "verification gap". You could see it as there being two modes in creation. Borrowing GAN terminology: 1) generation and 2) discrimination. e.g. painting - you make a brush stroke (1) and then you look for a while to see if you improved the painting (2). these two stages are interspersed in pretty much all creative work. Second point. Discrimination can be computationally very hard. - images are by far the easiest. e.g. image generator teams can create giant grids of results to decide if one image is better than the other. thank you to the giant GPU in your brain built for processing images very fast. - text is much harder. it is skimmable, but you have to read, it is semantic, discrete and precise so you also have to reason (esp in e.g. code). - audio is maybe even harder still imo, because it force a time axis so it's not even skimmable. you're forced to spend serial compute and can't parallelize it at all. You could say that in coding LLMs have collapsed (1) to ~instant, but have done very little to address (2). A person still has to stare at the results and discriminate if they are good. This is my major criticism of LLM coding in that they casually spit out *way* too much code per query at arbitrary complexity, pretending there is no stage 2. Getting that much code is bad and scary. Instead, the LLM has to actively work with you to break down problems into little incremental steps, each more easily verifiable. It has to anticipate the computational work of (2) and reduce it as much as possible. It has to really care. This leads me to probably the biggest misunderstanding non-coders have about coding. They think that coding is about writing the code (1). It's not. It's about staring at the code (2). Loading it all into your working memory. Pacing back and forth. Thinking through all the edge cases. If you catch me at a random point while I'm "programming", I'm probably just staring at the screen and, if interrupted, really mad because it is so computationally strenuous. If we only get much faster 1, but we don't also reduce 2 (which is most of the time!), then clearly the overall speed of coding won't improve (see Amdahl's law).
AI PROMPTING → AI VERIFYING AI prompting scales, because prompting is just typing. But AI verifying doesn’t scale, because verifying AI output involves much more than just typing. Sometimes you can verify by eye, which is why AI is great for frontend, images, and video. But for anything subtle, you need to read the code or text deeply — and that means knowing the topic well enough to correct the AI. Researchers are well aware of this, which is why there’s so much work on evals and hallucination. However, the concept of verification as the bottleneck for AI users is under-discussed. Yes, you can try formal verification, or critic models where one AI checks another, or other techniques. But to even be aware of the issue as a first class problem is half the battle. For users: AI verifying is as important as AI prompting.
Shota Azikuri retweeted
I've been writing this post for 10 years. Link in reply ⬇️
It's funny how, after all this time, the only model that's managed to beat Claude in coding is the new Claude model.
Relevant quote: "It may be dangerous to be America's enemy, but to be America's friend is fatal."
Shota Azikuri retweeted
What if I'm getting better at reasoning by reading R1 traces
84
54
19
1,141
I think the success of DeepSeek R1 is, to a significant extent, the result of the decision to make the thinking process visible.
I have this experience that when I try to fix within @cursor or @windsurf_ai using cursor 3.5 I can't get a satisfactory solution, but then I will copy the code into claude chat directly and often get a solution. Why is that?
New marketing strategy for learning to code: Become a power user of AI tools and spend less on tokens)
Another quick bolt․new tip: If it complains about missing "export default" from App.tsx, the fastest (and cheapest) way to fix it is to just add it manually. Might be a bit scary to do something with the code, but hey, that's how you become a power user, too!:) Here's how:
Shota Azikuri retweeted
This is exactly right. We apprehend and subsequently ‘understand’ things through what we already know. The more we know the more we can get to know. Memory is an ongoing reconstructive system where we constantly reshape and reorder what we have, it’s not the retrieving of static files from a drawer. In fact memory is better thought of as a verb than an adjective. It’s a process not a thing.
Getting the impression that it’s still too common for knowledge to be equated with ‘rote’ - knowledge held in the mind as like a bucket of items. It’s more helpful to think of knowledge held in the mind as an interconnected web or schema.
Shota Azikuri retweeted
We think that all memory is stored in the brain. But our study published today in @NatureComms shows that all cells—even kidney cells—can count, detect patterns, store memories, and do so similarly to brain cells. My first (co)corresponding author paper!🧵nature.com/articles/s41467-0…
Shota Azikuri retweeted
I think there is a similar thing going on with democratic peace theory, in the sense that the statistical regularity proponents of the theory draw upon can also be explained by the fact that, for most of history, liberal democracies have effectively been subordinated to the US.
Replying to @Peter_Nimitz
can't find the paper, but @devarbol recently linked an argument by some economist that institutional/economic/social benefits of democratization were mostly a function of USian power - republics & monarchies get sanctioned, so evolve poorly, while democracies don't.
Have you learned a foreign language specifically to access original texts in your field? (e.g., German for Hegel or Kant) What literature motivated you?
1