Omar Khattab · Oct 15, 2025 · 3:00 PM UTC

Omar Khattab

Kartik Sreenivasan retweeted

Omar Khattab

@lateinteraction

Oct 15

btw Alex is a second-month PhD student; he did this work in 4 weeks i have my suspicions that Alex has secret recursive Alexes that do his work for him, but i haven't been able to confirm that haha really fun post on recursive LMs with interesting trace examples, check it out!

Alex L Zhang

@a1zhang

Oct 15

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.

1,333

Matei Zaharia · Oct 17, 2025 · 6:18 PM UTC

Kartik Sreenivasan retweeted

Matei Zaharia @matei_zaharia

Oct 17

My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.

615

Cody Blakeney · Oct 7, 2025 · 8:59 AM UTC

Kartik Sreenivasan retweeted

Cody Blakeney

@code_star

Oct 7

So many teams in enterprise AI working on this and a single dude drops a paper just mogging them. Incredible.

Rohan Paul

@rohanpaul_ai

Oct 6

A 7B model, tuned for forms and docs, beats giant models at pulling structured data. Beats GPT-4.1 on 1,000 extraction tasks, trained for $196. The team generated synthetic training data that preserves memory across chunks of a long file. That memory lets the model connect names, dates, and values that appear far apart. They fine-tuned with Low Rank Adaptation, changing only 0.53% of weights. They then used Group Relative Policy Optimization with a semantic reward and strict JSON checks. This setup accepts different surface wording if the meaning matches. On 1,000 held-out tasks it hit 0.573 mean reward and 89% valid JSON, trained for $196, ahead of GPT-4.1 and others. Result, a small focused model can outperform general models and cost much less. ---- Paper – arxiv. org/abs/2509.22906 Paper Title: "Extract-0: A Specialized Language Model for Document Information Extraction"

363

Cody Blakeney · Oct 6, 2025 · 4:26 PM UTC

Kartik Sreenivasan retweeted

Cody Blakeney

@code_star

Oct 6

New website looking pretty slick 👀

John Schulman · Oct 1, 2025 · 6:08 PM UTC

Kartik Sreenivasan retweeted

John Schulman

@johnschulman2

Oct 1

Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without thinking of them" -Whitehead

Thinking Machines

@thinkymachines

Oct 1

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

116

1,302

Damek · Sep 30, 2025 · 5:51 PM UTC

Kartik Sreenivasan retweeted

Damek

@damekdavis

Sep 30

Two weeks since finishing and reflecting on What has changed before and after the course? 1. I can test and scale ideas much faster, eg, wrote a some simple lora RL exp around 15 minutes after reading the @thinkymachines post. 2. I have too many ideas, and I’m not used to the ml publishing model where something that could have been a tweet becomes a paper. Not sure how to handle this other than just pushing to github directly.

Damek

@damekdavis

Sep 16

My goal this summer was to learn llm engineering. In mid July, I learned about cs336. I finished this course today. I found it interesting, frustrating, and highy useful. The course covers: architecture, systems, data, scaling laws, and rl.

196

Stephen McAleer · Sep 29, 2025 · 12:53 AM UTC

Kartik Sreenivasan retweeted

Stephen McAleer

@McaleerStephen

Sep 29

Having done RL at OpenAI and Anthropic, here's what I can say about GRPO:

1,628

Kangwook Lee · Sep 29, 2025 · 10:21 PM UTC

Kartik Sreenivasan retweeted

Kangwook Lee

@Kangwook_Lee

Sep 29

Very nice work!! If you found this interesting, you may check out @yzeng58's work on LoRA. She mathematically characterized the conditions under which LoRA is as expressive as full finetuning.

Thinking Machines

@thinkymachines

Sep 29

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…

Naveen Rao · Sep 25, 2025 · 8:02 PM UTC

Kartik Sreenivasan retweeted

Naveen Rao

@NaveenGRao

Sep 25

Hello world Unconventional, Inc. I’ve gotten a new company off the ground. It’s a big swing…rethinking the foundations of a computer to build a new substrate for intelligence that is as efficient as biology. Brain Scale Efficiency without the biological baggage! We CAN do it. We have to think differently from existing dogma. The computer in our heads proves it’s possible. 20 watt brains are within our reach with the right research and beautiful engineering. Can you think differently? Do you want to be Unconventional? If so either drop me a DM or email at jobs@unconv.ai

513

Matei Zaharia · Sep 9, 2025 · 5:36 PM UTC

Kartik Sreenivasan retweeted

Matei Zaharia @matei_zaharia

Sep 9

I gave a keynote at @VLDBconf about why we think it's time to rethink OLTP databases with Lakebase, which combines the cloud-native Postgres design of @neondatabase with Lakehouse. Reasons include the cloud, changing demands on DWs to be more real-time, and AI agents. Slides ⬇️

jane zhang · Sep 6, 2025 · 5:59 AM UTC

Kartik Sreenivasan retweeted

jane zhang

@jjanezhang

Sep 6

It's been about a year since my team has fully adopted all the AI coding tools (Cursor, Claude Code) And day to day I am feeling the added cruft in the code base. Unit tests are not catching regressions. Unneeded mocking, comments, are left in between. More refactoring is needed to add new features. I find myself sitting down and rewriting files to ensure completeness, correctness, and ease for future developers more than I ever have before.

273

349

130

6,195

Kangwook Lee · Sep 2, 2025 · 6:25 PM UTC

Kartik Sreenivasan retweeted

Kangwook Lee

@Kangwook_Lee

Sep 2

Happy to share that I got tenured last month! While every phase in life is special, this one feels a bit more meaningful, and it made me reflect on the past 15+ years in academia. I'd like to thank @UWMadison and @UWMadisonECE for tremendous support throughout the past six years, helping me grow. I am very grateful to all the teachers I’ve met in the past 15+ years of research since undergrad. Prof. Sae-Young Chung introduced me to engineering, and in particular, information theory. Prof. Yung Yi and Prof. Song Chong introduced me to communication network theory, and from Prof. Yung Yi I learned the true passion for research. I miss him a lot. At Berkeley, I learned everything about research from my advisor Prof. Kannan Ramchandran. In particular, I learned that the most important motivation behind great research is endless curiosity and the desire to really understand how things work. From my postdoc mentor Prof. Changho Suh at KAIST, I learned the mindset of perfection, making every single paper count. During my assistant professorship, I was lucky to have the best colleagues. I learned so much from Rob (@rdnowak) and Dimitris (@DimitrisPapail). I am still learning from Dimitris' unique sense of research taste and Rob's example of how to live as the coolest senior professor. I also learned a lot from the Optibeer folks Steve Wright, Jeff Linderoth, and my ECE colleagues Ramya (@ramyavinayak) and Grigoris (@Grigoris_c). Thank you all! I’d like to thank my former students and postdocs too. Daewon and Jy-yong (@jysohn1108) joined my lab early on and worked on many interesting projects. Changhun and Tuan (@tuanqdinh) joined midway through his PhD and worked on interesting research projects, and in particular, Tuan initiated our lab’s first LLM research five years ago! Yuchen (@yzeng58), Ziqian (@myhakureimu), and Ying (@yingfan_bot) joined around the same time, and working with them has been the most fun and rewarding part of my job. Each took on a challenging topic and did great work. Yuchen advanced LLM fine-tuning, especially parameter-efficient methods. Ziqian resolved the mystery of LLM in-context learning. Ying explored "a model in a loop," focusing on diffusion models and looped Transformers. They all graduated earlier this year and are continuing their research at @MSFTResearch and @Google. Best wishes! 🥰 I am also grateful for co-advising Nayoung (@nayoung_nylee), Liu (@Yang_Liuu), and Joe (@shenouda_joe) with Dimitris and/or Rob. Nayoung's work on Transformer length generalization, Liu's on in-context learning, and Joe's on the mathematical theory of vector-valued neural networks are all very exciting. They are all graduating very soon, so stay tuned! (And reach out to them if you have great opportunities!) I also had the pleasure of working with master's students Ruisu, Andrew, Jackson (@kunde_jackson), Bryce (@BryceYicongChen), and Michael (@michaelgira23), as well as many visiting students and researchers. Thank you for being such great collaborators. I’d like to thank and introduce the new(ish) members too. Jungtaek (@jungtaek_kim) and Thomas are studying LLM reasoning. Jongwon (@jongwonjeong123) just joined, and interestingly he was an MS student in Prof. Chung’s lab at KAIST, which makes him my academic brother turned academic son. Ethan (@ethan_ewer), Lynnix, and Chungpa (visiting) are also working on cool LLM projects! Thank you to @NSF, @amazon, @WARF_News, @FuriosaAI, @kseayg, and KFAS for generous funding. I also learned a lot from leading and working with the AI team at @Krafton_AI, particularly with Jaewoong @jaewoong_cho, so thank you for that as well. Last and most importantly, thanks to my family! ❤️ I only listed my mentors and mentees here, not all my amazing collaborators, but thank you all for the great work together. With that, I’m excited for what’s ahead, and so far no "tenure blues." Things look the same, if not more exciting... haha!

298

Sumit · Aug 26, 2025 · 5:29 AM UTC

Kartik Sreenivasan retweeted

Sumit @_reachsumit

Aug 26

Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs Databricks demonstrates that retrieval performance on zero-shot BEIR tasks predictably scales with LLM size, training duration, and estimated FLOPs. 📝arxiv.org/abs/2508.17400

Retrieval Capabilities of Large Language Models Scale with...

How does retrieval performance scale with pretraining FLOPs? We benchmark retrieval performance across LLM model sizes from 125 million parameters to 7 billion parameters pretrained on datasets...

arxiv.org

Kangwook Lee · Aug 25, 2025 · 3:11 PM UTC

Kartik Sreenivasan retweeted

Kangwook Lee

@Kangwook_Lee

Aug 25

Average number of — em dashes — in an ML paper: Year 2023: 2.3 Year 2024: 1.9 Year 2025: 27.2 😭😇 @OpenAI please fix this — save the world from the em dash crisis #—crisis #emdashcrisis

Aug 25, 2025 · 12:46 AM UTC

Kartik Sreenivasan retweeted

开云体育/五大联赛/韩k联/日职联/世界杯，英超/西甲/欧冠/足球推荐/百家乐/足球预测 @dongmin_park11

Aug 25

🚀Introducing FlashAdventure, a new benchmark to eval GUI agents on diverse, long-horizon tasks in Flash games! 👏Big congrats to @AHNJAEWOO2 for leading this during his internship @Krafton_AI 🎮Games are a great environment to challenge LLMs and unlock their power on real apps!

Jaewoo Ahn @ EMNLP2025 @AHNJAEWOO2

Aug 23

🎉Our "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games" is accepted to #EMNLP2025 Main!🎉 We introduce a benchmark of 2D Flash adventure games (room escape, mystery/detective, visual novel, management) for full story completion. 🧵

Ernest Ryu · Aug 21, 2025 · 6:00 AM UTC

Kartik Sreenivasan retweeted

Ernest Ryu @ErnestRyu

Aug 21

This is really exciting and impressive, and this stuff is in my area of mathematics research (convex optimization). I have a nuanced take. 🧵 (1/9)

Sebastien Bubeck

@SebastienBubeck

Aug 20

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

309

3,160

Pratyush Maini · Aug 18, 2025 · 2:57 PM UTC

Kartik Sreenivasan retweeted

Pratyush Maini

@pratyushmaini

Aug 18

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

125

712

Jifan Zhang · Aug 19, 2025 · 10:47 AM UTC

Kartik Sreenivasan retweeted

Jifan Zhang

@jifan_zhang

Aug 19

Defended my PhD yesterday! I wrote a blogpost that distilled my research into a single takeaway (in 🧵) Thank you @rdnowak for the amazing four years. Now that I see it, he truly has the magic potion that helped so many of his students turn into great researchers.

Quinn Leng · Aug 19, 2025 · 4:40 PM UTC

Kartik Sreenivasan retweeted

Quinn Leng

@quinn_leng

Aug 19

AI agents that can write and execute code face two core challenges: security and scalability. Running code locally isn’t enough—laptops simply don’t provide the compute or memory agents need. Shared compute brings its own issues: it can’t scale horizontally and introduces serious security risks when multiple agents run together. We built a secure and scalable runtime for agent code execution to solve both problems. Our system gives agents the compute and memory they need, enforces precise permissions, and guarantees full isolation between environments. This makes it possible to truly unlock the exploratory power of AI agents—without the risks or bottlenecks.

Ashton Teng

@ashtonteng

Aug 19

🧱 Building the Right Infrastructure for AI Agents in Life Sciences R&D Full blog: keplogic.substack.com/p/buil… At Kepler AI, we build enterprise-grade AI agents to accelerate scientific discovery. One key lesson: the infrastructure matters just as much as the AI itself. 🧵

Cody Blakeney · Aug 18, 2025 · 6:10 AM UTC

Kartik Sreenivasan retweeted

Cody Blakeney

@code_star

Aug 18