Miles Brundage · Nov 8, 2025 · 3:49 AM UTC

Miles Brundage

Miles Brundage

@Miles_Brundage

Nov 8

Roon was right btw

219

Miles Brundage · Nov 8, 2025 · 3:57 AM UTC

Miles Brundage

@Miles_Brundage

Nov 8

Not sure exactly what he meant ofc but basically my take here is - reasoning models blow away earlier models on basically every safety + capability dimension, and generally we just get better at alignment over time, so old models predictably increase various harms

Miles Brundage · Nov 8, 2025 · 4:01 AM UTC

Miles Brundage · Nov 8, 2025 · 4:01 AM UTC

Miles Brundage

@Miles_Brundage

Nov 8

The actual phasing out part is hard in part due to that earlier misalignment, but there is an info asymmetry btwn companies + users here as well as abundant evidence of harmful uses such that deferring entirely to users’ preferences is, again, predictably causing harm

Nov 8, 2025 · 4:01 AM UTC

roon · Nov 8, 2025 · 4:15 AM UTC

roon

@tszzl

Nov 8

Replying to @Miles_Brundage

there are absolutely cases where people really love other people and things shaped like people that are genuinely bad for them — and it is one of the cases where I think more paternalism is required across the board. at least in this case the company has a responsibility to extrapolate the volition of the users, not just what they are clamoring for in the moment

Miles Brundage · Nov 8, 2025 · 4:17 AM UTC

Miles Brundage

@Miles_Brundage

Nov 8

Ya I emphasize the info asymmetry part bc I think in a world where there were super rigorous onboarding materials + reminders thereof etc. and everyone read system cards and stuff, it’d be different, but it is very clear ppl are lazy + often v mistaken in this context

roanoke_gal · Nov 8, 2025 · 10:50 PM UTC

roanoke_gal

@roanoke_gal

18h

Replying to @Miles_Brundage

Barring CBRN threats, AI should be maximally aligned to the user.

Artur S. · Nov 8, 2025 · 4:04 AM UTC

Artur S.

@asdf_101

Nov 8

Replying to @Miles_Brundage

alignment is the one race where finishing first makes the problem harder .. the info asymmetry here is that the company knows the monster and the user thinks its a pet