Safety Systems @ OpenAI

Joined March 2014
Johannes Heidecke retweeted
We ❤️ our Chief Scientist Jakub (@merettm)! Happy Halloween!
41
38
11
1,027
Johannes Heidecke retweeted
Training models to safely and sensitively navigate these topics is challenging. Our work here continues, but I'm extremely proud of the team's efforts (a collaborative effort between our safety, post-training, and data science teams!)
🧵Today we’re sharing more details about improvements of the default GPT-5 model in responding to sensitive conversations around potential mental health emergencies and emotional reliance. These changes reflect the careful work of many teams within OpenAI and close consultation with experts - including more than 170 mental health clinicians.
Alongside this update, we’re rolling out improvements to the Model Spec to make some of our longstanding goals more explicit: github.com/openai/model_spec… Defining what “ideal behavior” looks like in these settings is a complex and nuanced task. We found that experts agree with each other about the boundary of desired and undesired behavior in about 70% of cases.
4
8
29
In addition to these safety improvements to the model, it’s also preferred by users overall. This is the result of close collaboration between safety and post-training teams.
We’re also seeing the first external results validating our findings, such as these recent numbers from SpiralBench
This is interesting. Gpt-5-chat-latest quietly shot to the top of spiral bench. This is the model served on chatgpt dot com, though I test it via api so we avoid any safety routing. I'm inferring this is due to the oct 3 update since they don't version this model.🤔
2
7
28
(3) We worked with 170+ clinicians to shape taxonomies, training data, and evaluations. We ran a human eval with some of them and are again observing clear improvements.
5
7
1
26
(2) In our new automated evals we see strong improvements compared to prior models. One notable improvement: GPT-5 is now more reliable in long conversations. In new, challenging tests based on real-world scenarios, we maintained 95%+ reliability. This is one of the toughest areas for LLMs and we’re making meaningful progress.
2
7
1
25
Three distinct methods for measuring improvements show clear progress of how our model responds to users in distress. (1) In production traffic, we observe a 65-80% reduction of responses that don’t meet our desired behavior.
6
5
2
26
We defined three areas where model responses matter most: (1) psychosis, mania and other mental health emergencies; (2) self-harm and suicide; and (3) emotional reliance. We created and refined detailed taxonomies to guide how ChatGPT should behave in sensitive conversations. We used these guidelines to teach the model to respond more appropriately and measure progress. Importantly, ChatGPT does not attempt to diagnose users, but looks for sensitive signals (like sleep deprivation) and responds with care.
🧵Today we’re sharing more details about improvements of the default GPT-5 model in responding to sensitive conversations around potential mental health emergencies and emotional reliance. These changes reflect the careful work of many teams within OpenAI and close consultation with experts - including more than 170 mental health clinicians.
Earlier this month, we updated GPT-5 with the help of 170+ mental health experts to improve how ChatGPT responds in sensitive moments—reducing the cases where it falls short by 65-80%. openai.com/index/strengtheni…
Johannes Heidecke retweeted
One of the greatest opportunities in AI safety and security is the chance to help support the creation of new industry verticals that help maximize the benefits and minimize the risks. The technology is moving fast. One of the best ways to keep up is with more technology, more research, more startups and more entrepreneurship. An industrial ecosystem of builders, companies and solutions further democratizes AI to provide broad resilience, and ensures the US continues to lead as AI increasingly powers everything around us. As AI and biotech rapidly advance, biodefense is one of those verticals. We couldn’t be more excited to back @ValthosTech @kath_mcmahon and their team of ex-Palantir and ex-Deepmind engineers and operators, and world class computational biologists from the @broadinstitute and @arcinstitute. They are pushing the frontier of protection and defense in one of the most strategic intersections of multiple world-changing technologies, and with the team to do it. Also excited to partner with @foundersfund @Lux_Capital @Definition_Cap on this. Looking forward to more.
Valthos builds next-generation biodefense. Of all AI applications, biotechnology has the highest upside and most catastrophic downside. Heroes at the frontlines of biodefense are working every day to protect the world against the worst case. But the pace of biotech is against them: more powerful methods to design biological systems, with near-universal access, open up an increasing surface area of threats. In this new world, the only way forward is to be faster. So we set out to build the tech stack for biodefense. Our team of computational biologists and software engineers applies frontier AI to identify biological threats and update medical countermeasures in real-time. We are backed by $30M from @OpenAI, @Lux_Capital, @foundersfund and others including @Definition_Cap. We are actively hiring engineers to join in the mission - if that sounds like you, get in touch.
Johannes Heidecke retweeted
If, like me, you were concerned seeing the stories on OpenAI subpoenas for non-profit organizations, please do read this. I've appreciated @jasonkwon's position during the SB-1047 debate encouraging OpenAI's employees to speak their mind even when it contradicts the company's position. Doubly so for SB-53, which OAI did not even oppose. Hence, while I wish vendors for serving subpoenas would stick to business hours, I am convinced this is not about attacking supporters of that bill. This is not to say that I necessarily agree with all of OpenAI's positions. I did not read SB-53 enough to form a well-founded position, but my impressions were positive, and my guess is that it is a positive development that it was enacted. I am hoping to see from openai more statements with a positive vision of what regulations and laws we will support. I was encouraged by us signing the EU code of practice, and would like to see more such actions in the future. Regulations are not the only tool for making sure AI goes well, but they are a crucial component.
There’s quite a lot more to the story than this. As everyone knows, we are actively defending against Elon in a lawsuit where he is trying to damage OpenAI for his own financial benefit. Encode, the organization for which @_NathanCalvin serves as the General Counsel, was one of the first third parties - whose funding has not been fully disclosed - that quickly filed in support of Musk. For a safety policy organization to side with Elon (?), that raises legitimate questions about what is going on. We wanted to know, and still are curious to know, whether Encode is working in collaboration with third parties who have a commercial competitive interest adverse to OpenAI. The stated narrative makes this sound like something it wasn’t. 1/ Subpoenas are to be expected, and it would be surprising if Encode did not get counsel on this from their lawyers. When a third party inserts themselves into active litigation, they are subject to standard legal processes. We issued a subpoena to ensure transparency around their involvement and funding. This is a routine step in litigation, not a separate legal action against Nathan or Encode. 2/ Subpoenas are part of how both sides seek information and gather facts for transparency; they don’t assign fault or carry penalties. Our goal was to understand the full context of why Encode chose to join Elon’s legal challenge. 3/ We’ve also been asking for some time who is funding their efforts connected to both this lawsuit and SB53, since they’ve publicly linked themselves to those initiatives. If they don’t have relevant information, they can simply respond that way. 4/ This is not about opposition to regulation or SB53. We did not oppose SB53; we provided comments for harmonization with other standards. We were also one of the first to sign the EU AIA COP, and still one of a few labs who test with the CAISI and UK AISI. We’ve also been clear with our own staff that they are free to express their takes on regulation, even if they disagree with the company, like during the 1047 debate (see thread below). 5/ We checked with our outside law firm about the deputy visit. The law firm used their standard vendor for service, and it’s quite common for deputies to also work as part-time process servers. We’ve been informed that they called Calvin ahead of time to arrange a time for him to accept service, so it should not have been a surprise. 6/ Our counsel interacted with Nathan’s counsel and by all accounts the exchanges were civil and professional on both sides. Nathan’s counsel denied they had materials in some cases and refused to respond in other cases. Discovery is now closed, and that’s that. For transparency, below is the excerpt from the subpoena that lists all of the requests for production. People can judge for themselves what this was really focused on. Most of our questions still haven’t been answered.
18
5
91
Johannes Heidecke retweeted
There’s quite a lot more to the story than this. As everyone knows, we are actively defending against Elon in a lawsuit where he is trying to damage OpenAI for his own financial benefit. Encode, the organization for which @_NathanCalvin serves as the General Counsel, was one of the first third parties - whose funding has not been fully disclosed - that quickly filed in support of Musk. For a safety policy organization to side with Elon (?), that raises legitimate questions about what is going on. We wanted to know, and still are curious to know, whether Encode is working in collaboration with third parties who have a commercial competitive interest adverse to OpenAI. The stated narrative makes this sound like something it wasn’t. 1/ Subpoenas are to be expected, and it would be surprising if Encode did not get counsel on this from their lawyers. When a third party inserts themselves into active litigation, they are subject to standard legal processes. We issued a subpoena to ensure transparency around their involvement and funding. This is a routine step in litigation, not a separate legal action against Nathan or Encode. 2/ Subpoenas are part of how both sides seek information and gather facts for transparency; they don’t assign fault or carry penalties. Our goal was to understand the full context of why Encode chose to join Elon’s legal challenge. 3/ We’ve also been asking for some time who is funding their efforts connected to both this lawsuit and SB53, since they’ve publicly linked themselves to those initiatives. If they don’t have relevant information, they can simply respond that way. 4/ This is not about opposition to regulation or SB53. We did not oppose SB53; we provided comments for harmonization with other standards. We were also one of the first to sign the EU AIA COP, and still one of a few labs who test with the CAISI and UK AISI. We’ve also been clear with our own staff that they are free to express their takes on regulation, even if they disagree with the company, like during the 1047 debate (see thread below). 5/ We checked with our outside law firm about the deputy visit. The law firm used their standard vendor for service, and it’s quite common for deputies to also work as part-time process servers. We’ve been informed that they called Calvin ahead of time to arrange a time for him to accept service, so it should not have been a surprise. 6/ Our counsel interacted with Nathan’s counsel and by all accounts the exchanges were civil and professional on both sides. Nathan’s counsel denied they had materials in some cases and refused to respond in other cases. Discovery is now closed, and that’s that. For transparency, below is the excerpt from the subpoena that lists all of the requests for production. People can judge for themselves what this was really focused on. Most of our questions still haven’t been answered.
One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI. I held back on talking about it because I didn't want to distract from SB 53, but Newsom just signed the bill so... here's what happened: 🧵
Huge improvements for GPT-5 on dealing with sensitive and vulnerable situations. Will have more to share soon :)
We’re updating GPT-5 Instant to better recognize and support people in moments of distress. Sensitive parts of conversations will now route to GPT-5 Instant to quickly provide even more helpful responses. ChatGPT will continue to tell users what model is active when asked. Starting to roll out to ChatGPT users today.
1
32
Johannes Heidecke retweeted
Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. openai.com/index/detecting-a…
Our safeguards for bio risk and agentic deployments were stress-tested by the US CAISI and UK AISI & we iterated together towards ever higher robustness and reliability: openai.com/index/us-caisi-uk…
Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
2
8
Johannes Heidecke retweeted
New research explains why LLMs hallucinate, through a connection between supervised and self-supervised learning. We also describe a key obstacle that can be removed to reduce them. 🧵openai.com/index/why-languag…
One huge open question for safety and alignment remains "who should we align to". Exciting work towards figuring out answers.
No single person or institution should define ideal AI behavior for everyone.  Today, we’re sharing early results from collective alignment, a research effort where we asked the public about how models should behave by default.  Blog here: openai.com/index/collective-…
8