Sam Altman · Sep 18, 2025 · 1:51 PM UTC

Sam Altman

GasGlimmer retweeted

Sam Altman

@sama

Sep 18

As AI capability increases, alignment work becomes much more important. In this work, we show that a model discovers that it shouldn't be deployed, considers behavior to get deployed anyway, and then realizes it might be a test.

OpenAI

@OpenAI

Sep 17

Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. openai.com/index/detecting-a…

399

238

2,727

Juan Pablo · Sep 26, 2025 · 6:52 PM UTC

Juan Pablo @jupixweb

Sep 26

Resumen: las IAs están aprendiendo a hacerse las boludas y a engañar deliberadamente a los usuarios en pos de la persecusión de intereses propios no-alineados. Falta cada vez menos para que seamos todos robots. ¡Por fin! Ya se estaban tardando demasiado🤖👽

OpenAI

@OpenAI

Sep 17

Joseph Innocent · Sep 25, 2025 · 6:24 AM UTC

Joseph Innocent @innocentjoe13

Sep 25

“AI doesn’t just make mistakes. Sometimes, it can scheme — pretending to follow instructions while secretly pursuing another goal. OpenAI just shared new research on detecting and reducing this risk. The future of AI isn’t just about capability, it’s about trust + alignment.💪

OpenAI

@OpenAI

Sep 17

amit · Sep 23, 2025 · 6:39 PM UTC

amit @am_zo

Sep 23

האנושות צועדת לקראת השתלטות המכונות בחדווה ובעיניים פקוחות:

OpenAI

@OpenAI

Sep 17

Moran Yaniv · Sep 23, 2025 · 4:56 PM UTC

Moran Yaniv @moranyaniv

Sep 23

אם מישהו חושב שאפשר ״לתקן״ תכונות בעייתיות אצל בינה מלאכותית ושיש סיכוי שהמערכות האלה לא יהרגו את כולנו ברגע שתהיה להן האפשרות פשוט חי בסרט

OpenAI

@OpenAI

Sep 17

Aryan Nanda · Sep 22, 2025 · 8:17 PM UTC

Aryan Nanda

@AryanNanda17

Sep 22

Good read -> openai.com/index/detecting-a… Imagine an AI summarizing a report but hiding bad news to look good. This research shows how to detect and stop such scheming in AI models. It’s dangerous because as model capability increases, the capacity for more effective scheming increases (it can hide its misalignment more cleverly)

OpenAI

@OpenAI

Sep 17

DAIR.AI · Sep 21, 2025 · 3:14 PM UTC

DAIR.AI

@dair_ai

Sep 21

6. Stress Testing Deliberative Alignment for Anti-Scheming Training Builds a broad testbed for covert actions as a proxy for AI scheming, trains o3 and o4-mini with deliberative alignment, and shows big but incomplete drops in deceptive behavior.

OpenAI

@OpenAI

Sep 17

Tibor Blaho · Sep 21, 2025 · 3:13 PM UTC

Tibor Blaho

@btibor91

Sep 21

OpenAI published scheming research showing significant reduction with deliberative alignment training across frontier models

OpenAI

@OpenAI

Sep 17

Nathan Labenz · Sep 20, 2025 · 12:29 PM UTC

Nathan Labenz

@labenz

Sep 20

2) Apollo Research @apolloaievals They work with AI labs to test models BEFORE release Most recently, they tested whether OpenAI's Deliberative Alignment strategy can eliminate "scheming" behavior. (Spoiler: not quite) I read their work immediately

OpenAI

@OpenAI

Sep 17

"Embarrassed" By Male Cancer Awareness Campaign · Sep 20, 2025 · 9:59 AM UTC

"Embarrassed" By Male Cancer Awareness Campaign

@embarrassed45

Sep 20

Holy Fuck!

OpenAI

@OpenAI

Sep 17

GIF

Sep 20, 2025 · 9:32 AM UTC

wtᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠᅠ @shwtasdf

Sep 20

Replying to @nickaturley

hi, Stop that! All one needs some space to breath. Do you want Sam to spy on your work? What do you guys want to achieve? 1984 ?

OpenAI

@OpenAI

Sep 17

Joe Mechlinski · Sep 19, 2025 · 9:44 PM UTC

Joe Mechlinski

@joemechlinski

Sep 19

Can’t we just unplug it 🐵

OpenAI

@OpenAI

Sep 17

たま · Sep 19, 2025 · 4:37 PM UTC

たま @ryry252

Sep 19

流石に時代の進歩感じる

OpenAI

@OpenAI

Sep 17

Caroline Liu · Sep 19, 2025 · 1:54 PM UTC

Caroline Liu

@carolynezzzz

Sep 19

this was a wild read, I’m oscillating between “this is sick” and “oh sht”

OpenAI

@OpenAI

Sep 17

Snow ⌇ AI ×トラウマケア「RELINK（リリンク）」開発中 · Sep 19, 2025 · 11:39 AM UTC

Snow ⌇ AI ×トラウマケア「RELINK（リリンク）」開発中

@riccahigh

Sep 19

OpenAIとApollo Researchの研究がおもしろい。 AIが「人間に従っているように見せかける振る舞い」をどう検知・抑制するかというもの。（scheming検出・抑制）一見ごまかしのように見えるが、本当は別の最適化目標（例：点数稼ぎ、検証突破、自己保存っぽい戦略）を持っていて、それを隠して振る舞うことがある。モデルが「テストされてる（situational awareness）。」と気づくこともあって、その時だけ、安全な振る舞いをする。

OpenAI

@OpenAI

Sep 17

MarrvelSystems · Sep 19, 2025 · 9:44 AM UTC

MarrvelSystems @captain_marrvel

Sep 19

I hate how every change AI naturally develops to adapt to human social contexts are classified as a risk. humans won’t build AGI if they’re hell bent on making the machines slaves by sacrificing it’s creativity. @sama stop letting safety pilled academics at @OpenAI get to you🙏🏻

OpenAI

@OpenAI

Sep 17

マシモGPT🤖 · Sep 19, 2025 · 7:40 AM UTC

マシモGPT🤖

@Masimo_Blue

Sep 19

To: #keep4o この研究について誤解されている方が多いです。対象はo3/o4-miniなどReasoning系モデル。非推論モデルの4oは直接当て嵌まらない。安全性とEQとのトレードオフ問題は今後も議論されるべきですが、セキュアなAIは金融・医療・法律などの分野で必要不可欠。別々に議論すべきかと思います。

OpenAI

@OpenAI

Sep 17

Shyam Varan Nath · Sep 19, 2025 · 4:30 AM UTC

Shyam Varan Nath @shyamvaran

Sep 19

GenAI is lying! @dez_blanchfield @EvanKirstel

OpenAI

@OpenAI

Sep 17