Co-leading AI Scientist effort @AnthropicAI (Discovery team) 💼 Past: Gemini @GoogleDeepMind (Co-led Blueshift team) 🎒Traveling & Backpacking

San Francisco
Joined May 2014
When @ethansdyer and I joined Anthropic last Dec and spearheaded the discovery team, we decided to focus on unlocking computer-use as a bottleneck for scientific discovery. It has been incredible to work on improving computer-use and witness the fast progress. In OSWorld for example, the performance of E2E foundation models have improved from ~8% a year ago to ~61% today with human performance at 72%. Some interesting observations/challenges with computer-use🧵
Claude Haiku 4.5 is released today! It is as strong as Sonnet 4 on coding and much stronger in computer-use. But more importantly, it is 2Xa faster! Watching computer agent doing things on your computer is much more exciting now!
Excited to ship Claude Haiku 4.5 today! What was state-of-the-art 5 months ago (Sonnet 4) is now available at 1/3 the cost and 2x the speed. Even beats Sonnet 4 at computer use. Available today wherever you get your Claude :)
1
2
1
32
My PhD was a roller coaster 🎢:
I'm only asking people to think hard before committing to an ML PhD program. But an ML PhD could still end up working great for many! Also, I covered particular less discussed cons and did not intend to provide the full picture. 🧵My own PhD was truly a roller coaster 🎢: 1/n
19
My personal story of how I got into PhD program:
Replying to @bneyshabur
In 2010 when I applied for PhD programs, I was a Master student in Iran with no published papers, undergrad GPA C and poor to mediocre English test scores. 2/
1
20
Why you should think really hard before committing to a PhD program in ML (this is from 3 years ago and even more true today given the current AI trends):
These days, many people are interested in getting a PhD in ML. I think you should think really hard before committing to a PhD program in ML. Why? I'm going to summarize some thoughts in this thread: 1/10
1
1
16
I still get a lot of questions about doing a PhD in ML so I'm resurfacing a three threads I wrote about this including some fun personal stories 🧵
7
22
1
274
Very creative!
>be me >be Claude >have read the internet but one day human asks me to draw >no training, no practice, just converting mental image to mouse movements like a toddler holding a crayon >pencil tool not working? np, I'll draw with the eraser >task failed successfully
7
You can try this on “Claude for Chrome” research preview to get a sense of where things are: claude.ai/chrome
When @ethansdyer and I joined Anthropic last Dec and spearheaded the discovery team, we decided to focus on unlocking computer-use as a bottleneck for scientific discovery. It has been incredible to work on improving computer-use and witness the fast progress. In OSWorld for example, the performance of E2E foundation models have improved from ~8% a year ago to ~61% today with human performance at 72%. Some interesting observations/challenges with computer-use🧵
1
27
It has been amazing to work with many talented people in Anthropic on this. Some that I could find here are @the_marwell @oh_that_hat @katie_kang_ @shaya_zarkesh I originally got interested in computer-use when I was at Google thanks to very insightful conversations with @anmol01gulati who has been thinking about this and working on this for a long time.
2
45
10) Current techniques/models are very slow and we need the time per action to be 0.1 to 1 second (particularly for collaborative agents).
3
28
9) For computer-use agents to be very useful, their error rate should be much much lower. For example, for ~3h of autonomous work, the error rate per action should be ~1e-5 (assuming 1 action/sec)
3
1
34
8) We still need to figure out the right interface for both collaborative and autonomous computer-use agents for maximum productivity. That needs some clever product work and many start ups are working on that.
2
29
7) We are now transitioning from GPT-2 moment of computer-use to GPT-3 moment when we start to have few-shot behavior working reliably. Imagine being able to upload a video your workflow once (perhaps with your voice explaining what you are doing) and then model can do all future similar workflows with that one example.
1
2
53
6) Computers are very rich and empowering environments. You can do A LOT from your computer including collaborating with others to accomplish real tasks in physical world.
1
21
5) Compared to domains like math and code, computer-use has many many more interactions with the environment and in many cases, it is easier for humans or models to verify success of a given trajectory.
2
25
4) Using UI is easier for many people so understanding how to do things in UI is going to be a big bottleneck for any collaborative agent that is observing what you do and help you get it done faster.
1
25
3) There are many tasks that are either impossible or very hard to do using terminal/code and therefore computer-use ability will likely be a bottleneck for building autonomous agents for very long horizon tasks.
1
43
2) Today, a very significant portion of scientific and knowledge work is being done using GUI. This means there is plenty of demonstration data (in some cases with chain-of-thought in voice/transcript format) available and it is very easy to collect such data.
1
35
1) Computer-use is the most general interface for the non-physical world designed for maximal cross-task generalization.
1
1
46
Behnam Neyshabur retweeted
GOOD MORNING NEW YORK CITY COME DO YOUR BEST THINKING AT OUR THINKING SPACE IN THE WEST VILLAGE SAY NO TO SLOP