gpt-oss-safeguard lets developers use their own custom policies to classify content. The model interprets those policies to classify messages, responses, and conversations. These models are fine-tuned versions of our gpt-oss open models, available under Apache 2.0 license. Now on Hugging Face. huggingface.co/collections/o…
Our gpt-oss-safeguard models outperform gpt-5-thinking and the gpt-oss open models on multi-policy accuracy.
We partnered with ROOST to shape this open-weight release, identify developers’ critical needs, test the model, and produce developer documentation. Our cookbook explains how to write policy prompts that maximize gpt-oss-safeguard's reasoning power, choose the right policy length for deep analysis, and integrate oss-safeguard's reasoning outputs into production Trust & Safety systems. cookbook.openai.com/articles…
Replying to @OpenAI
People are going to learn a ton from how these models reason and handle real-world safety tradeoffs. The cookbook sounds like a must-read.

Oct 29, 2025 · 2:00 PM UTC