Research scientist at MosaicML/Databricks. PhD from UW-Madison. Interested in LLMs, optimization, and the meaning of life.

Joined July 2012
Kartik Sreenivasan retweeted
btw Alex is a second-month PhD student; he did this work in 4 weeks i have my suspicions that Alex has secret recursive Alexes that do his work for him, but i haven't been able to confirm that haha really fun post on recursive LMs with interesting trace examples, check it out!
What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length, as a REPL environment. On the OOLONG benchmark, RLMs with GPT-5-mini outperforms GPT-5 by over 110% gains (more than double!) on 132k-token sequences and is cheaper to query on average. On the BrowseComp-Plus benchmark, RLMs with GPT-5 can take in 10M+ tokens as their “prompt” and answer highly compositional queries without degradation and even better than explicit indexing/retrieval. We link our blogpost, (still very early!) experiments, and discussion below.
27
64
4
1,333
Kartik Sreenivasan retweeted
My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.
Kartik Sreenivasan retweeted
So many teams in enterprise AI working on this and a single dude drops a paper just mogging them. Incredible.
A 7B model, tuned for forms and docs, beats giant models at pulling structured data. Beats GPT-4.1 on 1,000 extraction tasks, trained for $196. The team generated synthetic training data that preserves memory across chunks of a long file. That memory lets the model connect names, dates, and values that appear far apart. They fine-tuned with Low Rank Adaptation, changing only 0.53% of weights. They then used Group Relative Policy Optimization with a semantic reward and strict JSON checks. This setup accepts different surface wording if the meaning matches. On 1,000 held-out tasks it hit 0.573 mean reward and 89% valid JSON, trained for $196, ahead of GPT-4.1 and others. Result, a small focused model can outperform general models and cost much less. ---- Paper – arxiv. org/abs/2509.22906 Paper Title: "Extract-0: A Specialized Language Model for Document Information Extraction"
2
31
1
363
Kartik Sreenivasan retweeted
New website looking pretty slick 👀
3
3
1
24
Kartik Sreenivasan retweeted
Tinker provides an abstraction layer that is the right one for post-training R&D -- it's the infrastructure I've always wanted. I'm excited to see what people build with it. "Civilization advances by extending the number of important operations which we can perform without thinking of them" -Whitehead
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker
49
116
8
1,302
Kartik Sreenivasan retweeted
Two weeks since finishing and reflecting on What has changed before and after the course? 1. I can test and scale ideas much faster, eg, wrote a some simple lora RL exp around 15 minutes after reading the @thinkymachines post. 2. I have too many ideas, and I’m not used to the ml publishing model where something that could have been a tweet becomes a paper. Not sure how to handle this other than just pushing to github directly.
My goal this summer was to learn llm engineering. In mid July, I learned about cs336. I finished this course today. I found it interesting, frustrating, and highy useful. The course covers: architecture, systems, data, scaling laws, and rl.
6
6
2
196
Kartik Sreenivasan retweeted
Having done RL at OpenAI and Anthropic, here's what I can say about GRPO:
76
38
16
1,628
Kartik Sreenivasan retweeted
Very nice work!! If you found this interesting, you may check out @yzeng58's work on LoRA. She mathematically characterized the conditions under which LoRA is as expressive as full finetuning.
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lor…
4
1
35
Kartik Sreenivasan retweeted
Hello world Unconventional, Inc. I’ve gotten a new company off the ground. It’s a big swing…rethinking the foundations of a computer to build a new substrate for intelligence that is as efficient as biology. Brain Scale Efficiency without the biological baggage! We CAN do it. We have to think differently from existing dogma. The computer in our heads proves it’s possible. 20 watt brains are within our reach with the right research and beautiful engineering. Can you think differently? Do you want to be Unconventional? If so either drop me a DM or email at jobs@unconv.ai
Kartik Sreenivasan retweeted
I gave a keynote at @VLDBconf about why we think it's time to rethink OLTP databases with Lakebase, which combines the cloud-native Postgres design of @neondatabase with Lakehouse. Reasons include the cloud, changing demands on DWs to be more real-time, and AI agents. Slides ⬇️
Kartik Sreenivasan retweeted
It's been about a year since my team has fully adopted all the AI coding tools (Cursor, Claude Code) And day to day I am feeling the added cruft in the code base. Unit tests are not catching regressions. Unneeded mocking, comments, are left in between. More refactoring is needed to add new features. I find myself sitting down and rewriting files to ensure completeness, correctness, and ease for future developers more than I ever have before.
Kartik Sreenivasan retweeted
Happy to share that I got tenured last month! While every phase in life is special, this one feels a bit more meaningful, and it made me reflect on the past 15+ years in academia. I'd like to thank @UWMadison and @UWMadisonECE for tremendous support throughout the past six years, helping me grow. I am very grateful to all the teachers I’ve met in the past 15+ years of research since undergrad. Prof. Sae-Young Chung introduced me to engineering, and in particular, information theory. Prof. Yung Yi and Prof. Song Chong introduced me to communication network theory, and from Prof. Yung Yi I learned the true passion for research. I miss him a lot. At Berkeley, I learned everything about research from my advisor Prof. Kannan Ramchandran. In particular, I learned that the most important motivation behind great research is endless curiosity and the desire to really understand how things work. From my postdoc mentor Prof. Changho Suh at KAIST, I learned the mindset of perfection, making every single paper count. During my assistant professorship, I was lucky to have the best colleagues. I learned so much from Rob (@rdnowak) and Dimitris (@DimitrisPapail). I am still learning from Dimitris' unique sense of research taste and Rob's example of how to live as the coolest senior professor. I also learned a lot from the Optibeer folks Steve Wright, Jeff Linderoth, and my ECE colleagues Ramya (@ramyavinayak) and Grigoris (@Grigoris_c). Thank you all! I’d like to thank my former students and postdocs too. Daewon and Jy-yong (@jysohn1108) joined my lab early on and worked on many interesting projects. Changhun and Tuan (@tuanqdinh) joined midway through his PhD and worked on interesting research projects, and in particular, Tuan initiated our lab’s first LLM research five years ago! Yuchen (@yzeng58), Ziqian (@myhakureimu), and Ying (@yingfan_bot) joined around the same time, and working with them has been the most fun and rewarding part of my job. Each took on a challenging topic and did great work. Yuchen advanced LLM fine-tuning, especially parameter-efficient methods. Ziqian resolved the mystery of LLM in-context learning. Ying explored "a model in a loop," focusing on diffusion models and looped Transformers. They all graduated earlier this year and are continuing their research at @MSFTResearch and @Google. Best wishes! 🥰 I am also grateful for co-advising Nayoung (@nayoung_nylee), Liu (@Yang_Liuu), and Joe (@shenouda_joe) with Dimitris and/or Rob. Nayoung's work on Transformer length generalization, Liu's on in-context learning, and Joe's on the mathematical theory of vector-valued neural networks are all very exciting. They are all graduating very soon, so stay tuned! (And reach out to them if you have great opportunities!) I also had the pleasure of working with master's students Ruisu, Andrew, Jackson (@kunde_jackson), Bryce (@BryceYicongChen), and Michael (@michaelgira23), as well as many visiting students and researchers. Thank you for being such great collaborators. I’d like to thank and introduce the new(ish) members too. Jungtaek (@jungtaek_kim) and Thomas are studying LLM reasoning. Jongwon (@jongwonjeong123) just joined, and interestingly he was an MS student in Prof. Chung’s lab at KAIST, which makes him my academic brother turned academic son. Ethan (@ethan_ewer), Lynnix, and Chungpa (visiting) are also working on cool LLM projects! Thank you to @NSF, @amazon, @WARF_News, @FuriosaAI, @kseayg, and KFAS for generous funding. I also learned a lot from leading and working with the AI team at @Krafton_AI, particularly with Jaewoong @jaewoong_cho, so thank you for that as well. Last and most importantly, thanks to my family! ❤️ I only listed my mentors and mentees here, not all my amazing collaborators, but thank you all for the great work together. With that, I’m excited for what’s ahead, and so far no "tenure blues." Things look the same, if not more exciting... haha!
63
6
3
298
Kartik Sreenivasan retweeted
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs Databricks demonstrates that retrieval performance on zero-shot BEIR tasks predictably scales with LLM size, training duration, and estimated FLOPs. 📝arxiv.org/abs/2508.17400
4
17
Kartik Sreenivasan retweeted
Average number of — em dashes — in an ML paper: Year 2023: 2.3 Year 2024: 1.9 Year 2025: 27.2 😭😇 @OpenAI please fix this — save the world from the em dash crisis #—crisis #emdashcrisis
2
1
25
🚀Introducing FlashAdventure, a new benchmark to eval GUI agents on diverse, long-horizon tasks in Flash games! 👏Big congrats to @AHNJAEWOO2 for leading this during his internship @Krafton_AI 🎮Games are a great environment to challenge LLMs and unlock their power on real apps!
🎉Our "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games" is accepted to #EMNLP2025 Main!🎉 We introduce a benchmark of 2D Flash adventure games (room escape, mystery/detective, visual novel, management) for full story completion. 🧵
3
22
Kartik Sreenivasan retweeted
This is really exciting and impressive, and this stuff is in my area of mathematics research (convex optimization). I have a nuanced take. 🧵 (1/9)
Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.
39
309
52
3,160
Kartik Sreenivasan retweeted
1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance
Kartik Sreenivasan retweeted
Defended my PhD yesterday! I wrote a blogpost that distilled my research into a single takeaway (in 🧵) Thank you @rdnowak for the amazing four years. Now that I see it, he truly has the magic potion that helped so many of his students turn into great researchers.
Kartik Sreenivasan retweeted
AI agents that can write and execute code face two core challenges: security and scalability. Running code locally isn’t enough—laptops simply don’t provide the compute or memory agents need. Shared compute brings its own issues: it can’t scale horizontally and introduces serious security risks when multiple agents run together. We built a secure and scalable runtime for agent code execution to solve both problems. Our system gives agents the compute and memory they need, enforces precise permissions, and guarantees full isolation between environments. This makes it possible to truly unlock the exploratory power of AI agents—without the risks or bottlenecks.
🧱 Building the Right Infrastructure for AI Agents in Life Sciences R&D Full blog: keplogic.substack.com/p/buil… At Kepler AI, we build enterprise-grade AI agents to accelerate scientific discovery. One key lesson: the infrastructure matters just as much as the AI itself. 🧵
2
5
Kartik Sreenivasan retweeted
5
1
6
54