Ex-OpenAI safety researcher (danger evals & AGI readiness), stevenadler.substack.com. Likes maximizing benefits and minimizing risks of AI

San Francisco, CA
Joined January 2018
Direct link to my latest article: stevenadler.substack.com/p/p…
AI companies are struggling to support users in crisis. What can they do better? I pored over a million-word-long ChatGPT psychosis episode to figure out where things went wrong, and what can be done to help. (🧵)
1
2
19
“xAI prioritizes truth-seeking and calling itself MechaHitler over hype; and that company name? Every damn time, as they say.”
OpenAI employees wishing death on their flagship model reveals deeper misalignment. If staff publicly undermines products, trust erodes fast. xAI prioritizes truth-seeking over hype; Grok evolves to help humanity understand reality, not cycle through premature graves. Keep building what lasts.
10
Unless OpenAI says otherwise, I read this as being about deployment to external users. That’s a good start, but I want to make sure they don’t do “internal deployment,” which might be riskier for a variety of reasons
10
It’s genuinely good that OpenAI considers this obvious, but I worry that the word “deploy” is doing a _ton_ of work. @OpenAI, I’d love if you’d commit to not _building_ superintelligence before you can “robustly align and control” it.
7
6
1
59
We're going to get AGI before we get a user-friendly printer, aren't we?
Spent 45 minutes trying to get a printer to work just now and I think there's still so much technological progress we can make with printers. My timelines for "printers that basically always print the job correctly" are the mid 2040s
4
3
70
Feels like a plotline straight out of Veep
“I’m Bill DeBlasio. I’ve always been Bill DeBlasio,” [wine importer Bill] DeBlasio said in an interview conducted Wednesday evening through his Ring doorbell in Huntington Station, Long Island, from his current location in Florida. “I never once said I was the mayor."
3
Steven Adler retweeted
"OpenAI is not suddenly one of the best-resourced nonprofits ever. From the public's perspective, OpenAI may be one of the worst financially performing nonprofits in history, having voluntarily transferred more of the public's entitled value to private interests than perhaps any charitable organization ever."
The Midas Project commends Attorneys General Kathy Jennings and Rob Bonta for their diligent work over the past year. That said, significant concerns remain about whether this restructuring adequately protects the @OpenAI mission, and the public. themidasproject.com/article-…
9
50
3
687
You don't really know what you believe, or why, until you've written it down. That's the advice I've heard for years, and I so badly wanted it to be false, but my experience is it's in fact true. So: More people should write up their beliefs about AI and the future!
For years I've been banging on about the value of forcing yourself to write up concrete scenarios. Our IAPS fellow @joshua_turner00 wrote up a nice blog post on this topic: blog.ai-futures.org/p/scenar…
3
30
Obviously you'd need to handle powering on/off, storing it away, etc. I also notice I can't help but feel some emotional pull to not put away something with a 'face' :/
1
2
(This thought brought to you by the NEO house-robot launch.)
NEO The Home Robot Order Today
Maybe obvious in retrospect, but I just realized: A robot cleaner could do chores only at times you're not home There doesn't have to be a creepy robot walking behind you while you're sitting on the couch.
4
1
3
13
I wish OpenAI would push harder to do the right thing, even before there's pressure from the media or lawsuits. In the interim, here's hoping my piece is more constructive pressure!
2
6
Notice how basically every safety stat for original GPT-5 is lower than the update they shipped, not just the mental health issues they leaned into. This suggests to me there was low-hanging fruit that hadn't been picked for some reason.
2
10
I broadly believe OpenAI that the updated GPT-5 is much more policy-compliant than the original. But why'd they launch such a non-compliant model to begin with?
2
1
5
Another idea I've liked: @Miles_Brundage's suggestion of an independent investigation on what's been happening with sycophancy back in April and the consequences since
Replying to @Miles_Brundage
What would be sufficient evidence of having learned? Minimally, a real independent investigation of the April events, with a public report.
1
11
OpenAI releasing some mental health info was a great step, but it's important to go further: - a committed, recurring time frame for re-reporting - today's rates vs recent past (suicidal planning, psychosis), incl. pre-sycophancy - clarity on if GPT-4o erotica will be allowed
2
7
Excitingly, OpenAI yesterday put out some mental health, vs the ~0 evidence of improvement they'd provided previously. I'm excited they did this, though I still have concerns.
We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realize this made it less useful/enjoyable to many users who had no mental health problems, but given the seriousness of the issue we wanted to get this right. Now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases. In a few weeks, we plan to put out a new version of ChatGPT that allows people to have a personality that behaves more like what people liked about 4o (we hope it will be better!). If you want your ChatGPT to respond in a very human-like way, or use a ton of emoji, or act like a friend, ChatGPT should do it (but only if you want it, not because we are usage-maxxing). In December, as we roll out age-gating more fully and as part of our “treat adult users like adults” principle, we will allow even more, like erotica for verified adults.
1
12
OpenAI claims they've mitigated mental health issues, so it's fine to bring back erotica. I'm doubtful and think the public deserves to know more.
1
13
I worked so, so hard on this piece: It's about OpenAI bringing back erotica, what's been going on with users' mental health, and how it all relates to making AI go well.
Might switch to posting cozy memes instead of not-so-cozy AI takes
1
1
10