Sherjil Ozair · Apr 19, 2025 · 11:10 PM UTC

Sherjil Ozair

Sherjil Ozair @sherjilozair

Apr 19

A lot of people presume we use reinforcement learning to train Ace. The founding team has extensive RL background, but RL is not how we'll get computer AGI. The single best way we know how to create artificial intelligence is still large-scale behaviour cloning.

Mahaoo

@mahaoo_ASI

Apr 19

PSA: agents acting in an environment is *not* reinforcement learning reinforcement learning is about having a reward signal for which you get reinforcement (hence the name) if your "reward" is 99.99% "predicting accurately the consequence of an action" (which is just regular unsupervised learning) and 0.01% some additional specific goal (which is actual RL), then calling the training procedure "reinforcement learning" is technically accurate but is very much a sin against the truth RL always was and will always remain "the cherry on the top" due to fundamental information reasons a sparse 1d reward signal just doesn't have enough information to train complex agents with trillions of parameters in complex enough environments, whereas just predicting the outcome of every action is a maximally dense feedback signal from the environment in terms of the information it provides I really find it somewhat offensive that RL people try to bucket everything that is about agents acting in environments into the RL bucket, because if you are slightly less in the weeds then you buy this and you eventually have an incorrect and imprecise understanding of the world

233

Sherjil Ozair · Apr 19, 2025 · 11:10 PM UTC

Sherjil Ozair · Apr 19, 2025 · 11:10 PM UTC

Sherjil Ozair @sherjilozair

Apr 19

This also negates a lot of AGI x-risk concerns imo. Typical safety-ist argument: RL will make agents blink past human-level performance in the blink of an eye But: the current paradigm is divergence minimization wrt human intelligence. It converges to around human performance.

Apr 19, 2025 · 11:10 PM UTC