Now in research preview: gpt-oss-safeguard
Two open-weight reasoning models built for safety classification.
openai.com/index/introducing…
gpt-oss-safeguard lets developers use their own custom policies to classify content. The model interprets those policies to classify messages, responses, and conversations.
These models are fine-tuned versions of our gpt-oss open models, available under Apache 2.0 license.
Now on Hugging Face.
huggingface.co/collections/o…
We partnered with ROOST to shape this open-weight release, identify developers’ critical needs, test the model, and produce developer documentation.
Our cookbook explains how to write policy prompts that maximize gpt-oss-safeguard's reasoning power, choose the right policy length for deep analysis, and integrate oss-safeguard's reasoning outputs into production Trust & Safety systems. cookbook.openai.com/articles…
Oct 29, 2025 · 12:13 PM UTC





