Research at @OpenAI; Reinforcement Learning; PhD from UT Austin. Previously FAIR Paris @AIatMeta, @CMU_Robotics @NVIDIAAI @UberATG.

San Francisco, CA
Joined July 2018
Check out GPT-5. Starting around two months ago now, was fortunate to get to contribute to something so fun!
GPT-5 is here. Rolling out to everyone starting today. openai.com/gpt-5/
14
One of the many things we reinvented and revived from RL; this one’s on policy distillation for LLM land
Hot take: DAgger (Ross 2011) should be the first paper you read to get into RL, instead of Sutton's book. Maybe also read scheduled sampling (Bengio 2015). And before RL, study supervised learning thoroughly.
2
2
21
I am on wait and watch mode on how good this is
NEO The Home Robot Order Today
12
Absolutely insane; these are some amazing people
Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)
1
20
Harshit Sikchi (will be at NeurIPS 25) retweeted
Update: Mehtaab and I pushed further on this. Using thousands of GPT5 queries, we found solutions to 10 Erdős problems that were listed as open: 223, 339, 494, 515, 621, 822, 883 (part 2/2), 903, 1043, 1079. Additionally for 11 other problems, GPT5 found significant partial progress that we added to the official website: 32, 167, 188, 750, 788, 811, 827, 829, 1017, 1011, 1041. For 827, Erdős's original paper actually contained an error, and the work of Martínez and Roldán-Pensado explains this and fixes the argument. The future of scientific research is going to be fun.
gpt5-pro is superhuman at literature search: it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thre…) by realizing that it had actually been solved 20 years ago h/t @MarkSellke for pointing this out to me!
Harshit Sikchi (will be at NeurIPS 25) retweeted
🤖 Robots rarely see the true world's state—they operate on partial, noisy visual observations. How should we design algorithms under this partial observability? Should we decide (end-to-end RL) or distill (from a privileged expert)? We study this trade-off in locomotion. 🧵(1/n)
2
39
5
133
Even with cool ideas, researchers often overlook how important implementation details can be. Getting these things right can be key to scaling up deep RL
(1/n) With over 1,300 citations, MBPO is often cited as proof that model based RL beats model free methods. In arxiv.org/pdf/2412.14312 we showed it often completely fails in DeepMind Control. In our new work, Fixing That Free Lunch (FTFL), we explain why and make it succeed.
20
SF does really summer in October
1
17
Harshit Sikchi (will be at NeurIPS 25) retweeted
We're finally out of stealth: percepta.ai We're a research / engineering team working together in industries like health and logistics to ship ML tools that drastically improve productivity. If you're interested in ML and RL work that matters, take a look 😀
Harshit Sikchi (will be at NeurIPS 25) retweeted
Yet more evidence that a pretty major shift is happening, this time by Scott Aaronson scottaaronson.blog/?p=9183&f…
Harshit Sikchi (will be at NeurIPS 25) retweeted
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. openai.com/index/gdpval-v0
61
190
50
1,284
RLZero will be presented at @NeurIPSConf 2025 . Learn more about the work in the thread below:
🤖 Introducing RL Zero 🤖: a new approach to transform language into behavior zero-shot for embodied agents without labeled datasets! RL Zero enables prompt-to-policy generation, and we believe this unlocks new capabilities in scaling up language-conditioned RL, providing an interpretable link between RL agents and humans and achieving true cross-embodiment transfer.
4
7
55
A good way to test generalizable capability in current world of potentially contaminated datasets are competitions and we are making steady progress!
1/n I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have placed it first among all human participants. 🥇🥇
9
Harshit Sikchi (will be at NeurIPS 25) retweeted
[1/4] 🚀 We’re excited to announce the v1 release of JaxAHT – a new library for Ad Hoc Teamwork (AHT) research, built with JAX for speed & scalability! Check it out 👉 larg.github.io/jax-aht #AI #MARL #ReinforcementLearning #JAX #AdHocTeamwork
1
7
3
36
Harshit Sikchi (will be at NeurIPS 25) retweeted
LLMs lose diversity after RL post-training, and this hurts test-time scaling & creativity. Why does this collapse happen, and how can we fix it? Our new work introduces: 🔍 RL as Sampling (analysis) 🗺️ Outcome-based Exploration (intervention) [1/n]
9
88
7
467
Harshit Sikchi (will be at NeurIPS 25) retweeted
#K2Think (🏔️💭) is now live. We're proud of this model that punches well above its weights, developed primarily for mathematical reasoning but has shown itself to be quite versatile. As a fully deployed reasoning system at k2think.ai you can test it for yourself!
Introducing K2 Think - a breakthrough in advanced AI reasoning. Developed by MBZUAI’s Institute of Foundation Models and @G42ai, K2 Think delivers frontier reasoning performance at a fraction of the size of today’s largest systems. Smaller. Smarter. Open to the world. Available now: K2Think.Ai/K2Think #K2Think #AI #OpenSource #MBZUAI #G42 #Innovation
Harshit Sikchi (will be at NeurIPS 25) retweeted
our team at openai is hiring technical staff to build frontier evals for finance. If you're passionate about measuring real-world capabilities, have a love/hate relationship with Excel, or are an ex-banker/ex-investor with technical skills, please reach out! openai.com/careers/research-…
41
46
24
1,091
Harshit Sikchi (will be at NeurIPS 25) retweeted
Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.
It has been a good conference ⁦@RL_Conference⁩ ; Below ⁦@RLBRew_RLC⁩ social, edmonton flame, a great talk. Conference detox needed now
1
1
31