Today I taught a PhD class on the economics of AI. In doing so, I drew this picture on the board of my current understanding of what I called the "how good is AI" literature (aka the productivity impacts of AI). I thought I'd write up a long version of that discussion - at the very least for my own notes.
The basic point I made is that answering this question requires an understanding of the crazy multi-dimensionality of the problem that dramatically shapes the questions you ask and the answer you get.
A few key dimensions:
a) Tasks -- this one is obvious. How useful AI is depends on what specific task is being considered. Every day, we get a paper along the lines of "I had AI help with task X" (call center work, consulting, graphic design, writing ... name it) -- and here is what I found.
b) Human + AI OR Human vs. AI AKA is AI augmenting or replacing humans? -- The value of AI in helping a human achieve something is not the same as the AI doing the task itself. Yet, we often conflate the two. Economists and social scientists often focus on the augmentation test, while CS people are mostly comparing humans with AI. For example see this nice paper from Serina Chang showing how the two are not the same (
aclanthology.org/2025.acl-lo…)
c) Point vs Systems integration -- Early papers have mostly been about isolated tasks divorced from the specific job in artificial settings. Even "real" field experiments have focused on work like call center work that is atomistic. However, one could imagine answers we get are very different when tasks are embedded with jobs and jobs are embedded within organizations. This point is what is driving the divergence between "AI increased productivity by a gazillion percent" in studies and "95% of all AI implementations fail" narrative from real enterprises. Surely the value of AI depends a lot on this unit of analysis.
d) Which AI and How AI: CS researchers are much better on this than economists, but the basic idea is that there is no "one" AI system and results clearly depend on which AI we are talking about. And here, its not just about which model you look at (GPT 3.5 vs 4o say), but also how these models are prompted and going forward the extent to which they are daisy chained in agentic systems, or indeed fine tuned or modified in some deep way for specific tasks. Just because an AI system sucks at something out of the box, doesn't mean that with some work it can't be made to improve. Or not. We'll never know unless we stop treating AI like a monolith.
e) Which Human? -- This is one point on which we have made good progress but still a lot remains to do. Clearly the value of AI relative to a human or augmenting a human depends a lot on who the human is. Past work has found a flattening of the expertise curve in routine tasks -- but a mental model where AI's productivity effects are removed from which humans we are comparing them to would be the wrong mental model. Corollary, that human's incentives might matter as much as their capabilities!
Now imagine asking the value of AI question and blowing it up in terms of each of these dimensions:
Tasks X Augment/Automate X Point/System X <every AI model out there> X <every human out there>
it becomes pretty clear that we are only getting started in terms of understanding the societal/productivity implications of this thing. And thats even before the put the LLMs inside robots and let them loose on the world.
Grad students rejoice!
<HT: I feel like I channeled my inner
@random_walker here - so this is inspired by his writings on AI snake oil and related topics!>