Jacob Beck · Apr 5, 2025 · 6:04 PM UTC

Jacob Beck

Pinned Tweet

Jacob Beck @jakeABeck

Apr 5

Big news—our survey paper “A Tutorial on Meta-Reinforcement Learning” is officially published! Meta-RL = learning how to adapt through interaction. It embraces The Bitter Lesson: don’t hardcode agents—train them to adapt on their own arxiv.org/abs/2301.08028 🧵⬇️

A Tutorial on Meta-Reinforcement Learning

While deep reinforcement learning (RL) has fueled multiple high-profile successes in machine learning, it is held back from more widespread adoption by its often poor data efficiency and the...

arxiv.org

341

Jacob Beck · Oct 24, 2025 · 8:53 PM UTC

Jacob Beck @jakeABeck

Oct 24

Summer 2026 Internship — Oracle (Boston, MA) My fantastic research team is hiring! Projects include a data scientist agent with in-context learning, evolutionary search (a la AlphaEvolve), AI feedback, and RL/ES Apply here! eeho.fa.us2.oraclecloud.com/… 📧 jake.beck@oracle.com

PhD Research Assistant - Machine Learning Research Group

The Machine Learning Research Group is searching for outstanding AI and ML research interns to join our research team. Must be enrolled in a university prior to and post internship. Target Internship...

eeho.fa.us2.oraclecloud.com

Jacob Beck · Sep 30, 2025 · 5:32 PM UTC

Jacob Beck @jakeABeck

Sep 30

Excited to host @ml_collective Office Hours! Available for advice, mentorship, and discussion on RL, LLMs, meta-learning, and beyond. mlcollective.org/services/

ML Collective @ml_collective

Sep 30

Welcome to the newest MLC Office Hours host, @jakeABeck, researcher at Oracle! Schedule a chat with Jacob at the link below to talk about RL, LLMs, hypernetworks, meta-learning, multi-agent RL, AI feedback, philosophy, and more! mlcollective.org/services/

Jacob Beck · Sep 18, 2025 · 12:08 PM UTC

Jacob Beck @jakeABeck

Sep 18

4️⃣ Superintelligence does not beget super-power. Some systems are inherently unpredictable, and prediction doesn’t guarantee control. Knowing how a hurricane forms doesn’t mean you can steer one.

Jacob Beck · Sep 18, 2025 · 12:08 PM UTC

Jacob Beck @jakeABeck

Sep 18

3️⃣ We already live alongside “misaligned superintelligences” in the form of adversarial nation states. North Korea would love to destroy the US, and yet here we are. The benefits of superintelligence are limited by real-world constraints.

Jacob Beck · Sep 18, 2025 · 12:08 PM UTC

Jacob Beck @jakeABeck

Sep 18

2️⃣ AI is trained to follow prompts, making it highly amenable to alignment — with failures solely due to incompetence. We only train otherwise for safety, e.g. refusing to build a bioweapon. The real danger isn’t disobedient machines; it’s humans, misaligned with each other.

Jacob Beck · Sep 18, 2025 · 12:08 PM UTC

Jacob Beck @jakeABeck

Sep 18

1️⃣ Exponential AI self-improvement is shaky. The real bottleneck isn’t code; it’s compute & data. In these areas, AIs training AIs are just as limited by the world as humans training AIs. For both, we’ve nearly exhausted the internet’s data.

Jacob Beck · Sep 18, 2025 · 12:08 PM UTC

Jacob Beck @jakeABeck

Sep 18

AI optimists “don’t have counter-arguments — they just call names.” — @So8res on a podcast with @ESYudkowsky + Sam Harris Curious what you two think of these counter-arguments. And since @ylecun was called out by name, I’d love his take too…

TalkRL Podcast · Aug 19, 2025 · 6:53 PM UTC

Jacob Beck retweeted

TalkRL Podcast

@TalkRLPodcast

Aug 19

E71: Jake Beck, Alex Goldie, & Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ @RL_Conference 2025 A few thoughts with @jakeABeck, @AlexDGoldie and @corbraun after @RichardSSutton's fascinating lecture on his OaK architecture at @UAlberta

Jacob Beck · Aug 14, 2025 · 2:50 AM UTC

Jacob Beck @jakeABeck

Aug 14

running out of data: arxiv.org/pdf/2211.04325 (@pvllss et al) repeated sampling: arxiv.org/pdf/2411.17501v1 (@benediktstroebl et al)

Jacob Beck · Aug 14, 2025 · 2:42 AM UTC

Jacob Beck @jakeABeck

Aug 14

Results for RL+LLM are mixed, with positive results where reasoning chains are short, the LLM already has a sense of what to do, and we already know the answer — leaving us to exhaust our finite pool of supervised data Same for repeated sampling Time for exploration bonuses

Yang Yue @YangYue_THU

Jul 24

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning? Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding. Explore is holy grail for LLM and may entail beyond 0/1 reward.

Jacob Beck · Aug 9, 2025 · 8:37 PM UTC

Jacob Beck @jakeABeck

Aug 9

@codewithimanshu @MihneaDeVries @yuxili99 LLMs lack nuances of real-world interactive learning—but I’ve seen them explore, interact, and update beliefs in disparate domains. My point is that, with an LLM or not, we can apply the same recipe to RL. Just add some gridworlds!

Jacob Beck · Aug 9, 2025 · 6:30 PM UTC

Jacob Beck @jakeABeck

Aug 9

Thoughts on this? The possibility of exponential AI self-improvement is shaky. The real bottleneck isn’t code; it’s compute & data. In these areas, AIs training AIs are just as limited by the world as humans training AIs. For both, we’ve nearly exhausted the internet’s data.

Jacob Beck · Aug 9, 2025 · 6:11 PM UTC

Jacob Beck @jakeABeck

Aug 9

Thoughts on this? AI is trained to follow prompts, making it highly amenable to alignment. The only time we train it not to obey is for safety constraints, such as refusing to build a bioweapon. The real danger isn’t disobedient machines. It’s humans, misaligned with each other.

Jacob Beck · Aug 9, 2025 · 1:04 AM UTC

Jacob Beck @jakeABeck

Aug 9

Citations: @juliancodaforno et al., 2023 and @janexwang et al., 2016

Jacob Beck · Aug 8, 2025 · 11:46 PM UTC

Jacob Beck @jakeABeck

Aug 8

We’ve also been meta-learning model-based learning algos with seq models for years—long before the current LLM wave: arxiv.org/pdf/1611.05763

Jacob Beck · Aug 8, 2025 · 11:46 PM UTC

Jacob Beck @jakeABeck

Aug 8

LLMs can handle non-stationary in-context learning, and even in-context meta-learning: arxiv.org/abs/2305.12907! More important than using LLMs, they give us a proven recipe: A flexible seq model with memory (Turing-complete, noted by Dale Schuurmans) + tons of data (or MDPs)!

Meta-in-context learning in large language models

Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is...

arxiv.org

Jacob Beck · Aug 8, 2025 · 11:27 PM UTC

Jacob Beck @jakeABeck

Aug 8

Fantastic talk from @RichardSSutton at @RL_Conference with shoutouts to meta-RL. Honored to be called “more extreme” than Rich (by Rich) for taking the Bitter Lesson to heart and suggesting we meta-learn all the components he discussed. My Q: Aren’t LLMs already doing all this?

143

Jacob Beck · Aug 8, 2025 · 11:17 PM UTC

Jacob Beck @jakeABeck

Aug 8

Hey I recognize those people! @AlexDGoldie

TalkRL Podcast

@TalkRLPodcast

Aug 8

Humans of @RL_Conference

Alex Goldie · Aug 7, 2025 · 3:15 PM UTC

Jacob Beck retweeted

Alex Goldie @AlexDGoldie

Aug 7

🥳 It’s an honour to have been awarded the Outstanding Paper for Scientific Understanding in RL at RLC for our work, ‘How Should We Meta-Learn RL Algorithms?’ Thank you to the organisers @RL_Conference for putting on a great conference, and congratulations to the other winners!

225

Jacob Beck · Aug 4, 2025 · 4:45 PM UTC

Jacob Beck @jakeABeck

Aug 4

Heading to @rl_conference (RLC) today! Come say hi if you’re around—happy to chat about meta-RL, coding agents, and my work at @RLBrew_RLC tomorrow on VLM feedback for RL 🍁

Jacob Beck @jakeABeck

Aug 1

LLMs do not natively understand how to act as agents in the world 🤔⁉️ But VLMs can evaluate success—and provide feedback for training generalist agents with RL ✅ Offline RL from AI Feedback charts the path—To be presented @RLBrew_RLC @rl_conference! github.com/Jacooba/OfflineRL…