gpt-oss-safeguard lets developers use their own custom policies to classify content. The model interprets those policies to classify messages, responses, and conversations. These models are fine-tuned versions of our gpt-oss open models, available under Apache 2.0 license. Now on Hugging Face. huggingface.co/collections/o…
Our gpt-oss-safeguard models outperform gpt-5-thinking and the gpt-oss open models on multi-policy accuracy.
We partnered with ROOST to shape this open-weight release, identify developers’ critical needs, test the model, and produce developer documentation. Our cookbook explains how to write policy prompts that maximize gpt-oss-safeguard's reasoning power, choose the right policy length for deep analysis, and integrate oss-safeguard's reasoning outputs into production Trust & Safety systems. cookbook.openai.com/articles…

Oct 29, 2025 · 12:13 PM UTC

Replying to @OpenAI
The blue cookbook cover with its clear guide on GPT safeguards feels like a helpful starting point for builders. 🔒
Replying to @OpenAI
People are going to learn a ton from how these models reason and handle real-world safety tradeoffs. The cookbook sounds like a must-read.
Replying to @OpenAI
Open-weight safety models are crucial for transparency. Developers can now audit and customize safety mechanisms rather than relying on black-box solutions.
Replying to @OpenAI
Congrats on the collaboration with ROOST! How do you see the balance between policy prompt optimization and safeguarding model integrity evolving as AI systems become more complex? Would love to hear more about the challenges and innovations.