Not sure exactly what he meant ofc but basically my take here is - reasoning models blow away earlier models on basically every safety + capability dimension, and generally we just get better at alignment over time, so old models predictably increase various harms
3
91
The actual phasing out part is hard in part due to that earlier misalignment, but there is an info asymmetry btwn companies + users here as well as abundant evidence of harmful uses such that deferring entirely to users’ preferences is, again, predictably causing harm

Nov 8, 2025 · 4:01 AM UTC

4
2
1
65
Replying to @Miles_Brundage
there are absolutely cases where people really love other people and things shaped like people that are genuinely bad for them — and it is one of the cases where I think more paternalism is required across the board. at least in this case the company has a responsibility to extrapolate the volition of the users, not just what they are clamoring for in the moment
Ya I emphasize the info asymmetry part bc I think in a world where there were super rigorous onboarding materials + reminders thereof etc. and everyone read system cards and stuff, it’d be different, but it is very clear ppl are lazy + often v mistaken in this context
10
Replying to @Miles_Brundage
Barring CBRN threats, AI should be maximally aligned to the user.
Replying to @Miles_Brundage
alignment is the one race where finishing first makes the problem harder .. the info asymmetry here is that the company knows the monster and the user thinks its a pet