•mle @cohere; LLM posttraining •i write code&songs&stories •🇹🇷🇬🇧 tweets •proud Bilkenter •chasing legacy✨ •views are mine, not my employer's. rt≠endorsement

Riverside, CA
Joined December 2020
Pinned Tweet
peeps here is my first solo single. give it some love! open.spotify.com/track/3Ga31…
7
3
31
Irem Ergün retweeted
Ai2 is hiring interns for next year at all levels: undergrad, master’s, PhD. We have a particular interest in systems work; inference/training logit mismatch is an area of active research, as is general RL infra (asynchronicity!).
8
28
1
322
Irem Ergün retweeted
This pic goes insanely hard
i have a folder on my computer named "keep going" here are some of the images inside of it: (part 2)
Irem Ergün retweeted
The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations that make the field easier to grasp. Link in thread!
spending the halloween night with squashing bugs, reviewing PRs, and making GPUs go brr grindset is real, and it envelops me
2
Ankara
name the city?
1
12
Start the new week with a song from me! spotify.link/CUYnm1ifCXb this baby was released last year but is just remastered ✨
1
5
kargom elime gecti tesekkurler 😎 null check ve exception handling hayat kurtarir 😮‍💨 @suratkargo
1
Irem Ergün retweeted
I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3
There is a recent benchmark called NoLiMa, which designs needle/question pairs with minimal lexical (word) overlap, forcing models to rely on latent associations instead of literal syntax matching. When this overlap is removed, models that looked strong in standard benchmarks degrade sharply. Understanding such shortcomings of the LLM evaluations is crucial for progress in the field 🫡 (7.5/7.5)
2
3
Finally, even evaluating long context performance is painful!! 🤠🤠 Most benchmarks follow a “needle-in-a-haystack” style: you insert a piece of relevant information (“needle”) into a large corpus (“haystack”) and ask the model to fetch it. To inflate difficulty, they often pad the context with many irrelevant documents. Many state-of-the-art models already saturate these benchmarks by exploiting literal matching (word overlap) between question and context, kind of cheating, I would say. (7/7.5)
1
1
2
Caveats: - More context = more compute & memory, so there is a resource wall. - Doing RL on long context data becomes practically impossible. - Not all extra context helps; irrelevant or noisy content may mislead the model. In order to be robust, the model might need to be trained for such use cases. - Extrapolation risks: what works within training lengths may fail in a longer context. (6/7)
1
3
While we are mentioning long context performance, I think we should also mention context engineering, ie, how we should leverage the existing context effectively. Necessitates another thread, so here's a very decent post🙌🏻: x.com/RLanceMartin/status/19… 🤠 (5/7)
Context Engineering @dbreunig and I did a meetup on context engineering last night. Wanted to share slides (below) + a recap of some themes / discussion points. 1/ Context grows w/ agents. @ManusAI_HQ mentions typical task requires ~50 tool calls. manus.im/blog/Context-Engine… 2/ Performance drops as context grows. @kellyhongsn + @trychroma showed this very nicely. research.trychroma.com/conte… 3/ @dbreunig highlights that new buzzwords ("context eng") identify common experiences. Many of us built agents this year and had challenges wrt managing context. @karpathy distilled this well back in May. x.com/karpathy/status/193790… 4/ Many are sharing their experiences in blogs, etc but no common philosophy yet. "Pre-HTML era". Still, some common themes are emerging. 6/ Offload context. Use file system to offload context. @ManusAI_HQ writes todo.md at the start of a task and re-writes it during the task. They found that recitation of agent objective is helpful. Anthropic multi-agent writes research plan to file so it can be retrieved as needed and preserved. Manus offloads tok heavy tool observations. anthropic.com/engineering/bu… 7/ Reduce context. Summarize / prune messages / tool observations. Seen across many examples. Anthropic multi-agent summarizes the work of each sub agent. We use it w/ open deep research to prune tool feedback. github.com/langchain-ai/open… 8/ Retrieve context. RAG has been a major theme w/ LLM apps for several years. @_mohansolo (Windsurf) and Cursor team have shared interesting insights on what it takes to perform RAG w/ prod code agents. On Lex pod, @mntruell (Cursor) + team talk about Preempt to assemble retrievals into prompts. Clearly have been doing "context eng" since well before the term. x.com/_mohansolo/status/1899… lexfridman.com/cursor-team-t… 9/ Isolate context. A lot of interest in using multi-agent systems to isolate context. @barry_zyj + co (Anthropic) argue benefits, @walden_yan argues risks (it is hard to coordinate). Need to be careful, but benefit in cases where independent decisions made by each sub-agent won't case conflicts. cognition.ai/blog/dont-build… 10/ Cache context. @ManusAI_HQ mentions caching agent message history (system prompt, tool desc, past messages). Big cost / latency saving, but still does not get around long-context problems. Still very early in all of this ..
1
4
5. Adapt parameters at inference time, LongInput Fine-Tuning (LIFT) 6. Token analysis and modification for extending context length without actually extending it (LongRecipe) 7. Model merging can always be tried 🫵🏻 8. Sliding window & segment chaining (4/7)
1
2
Techniques to improve long-context performance: 1. Changes to model architecture, such as RoPE, RNoPE-SWA, MemTransformer 2. Parameter-efficient context extension, such as LONGLORA 3. Continued training on long sequences (beyond target) 4. Mixing long-context with shorter instruction data. (3/7)
1
4
Some failure modes: 1. With too many tokens, attention struggles to fetch relevant signals since they get lost among noise. 2. Standard positional embeddings often don’t generalize well beyond training length. 3. Models can struggle to retrieve info from the “middle” of a long input. 4. Extending context beyond some threshold may even hurt RAG performance. (2/7)
1
3
Let’s talk about long-context in LLMs because I have been thinking about it for a while👀🧵 - Many real-world tasks require reasoning over long spans (books, docs, reports, codebases). - Better performance lets models maintain coherence, recall earlier facts, and resolve cross-references. - However, scaling context naively is very costly and often leads to performance degradation. ‼️ (1/7)
3
1
22
Her ülkenin kendi yapay zeka modeline sahip olması gerektiğine inandığımı birçok kez belirtmişimdir. Yapay zeka, günümüzün ve geleceğin en kritik teknolojisidir ve etkisi her geçen gün daha da artmaktadır. Bir ülkenin ordusu nasıl fiziksel güvenliğini sağlıyorsa, yapay zeka da dijital çağda o ülkenin geleceğini korur ve şekillendirir. Bu nedenle Kumru projesi üzerinde çalışan tüm ekibi içtenlikle tebrik ediyor, projeyi derinlemesine incelemek için sabırsızlanıyorum.👏🏻🎉
🕊️ Kumru şimdi yayında! Türkçe için sıfırdan eğitilmiş ve kamuya açık şekilde paylaşılan ilk büyük dil modeli Kumru LLM ile tanışın! Kumru, Türkçe doğal dil işleme alanında güçlü, verimli ve özelleştirilebilir bir çözüm sunarak yerelleştirilmiş yapay zekâ deneyiminde yeni bir dönem başlatıyor. 7.4 milyar parametreye sahip model, tamamen Türkçe için eğitilen tokenizer’ı sayesinde çok dilli modellere göre %90’a kadar daha verimli çalışıyor. 300 milyar token ve 500 GB veriden oluşan eğitim setiyle Kumru, Türkçeyi sadece bilmekle kalmıyor, dilin doğal akışını da anlıyor. Araştırmadan kurumsal uygulamalara kadar geniş bir kullanım alanı sunan Kumru; RAG tabanlı chatbot sistemlerinden doküman özetlemeye, çağrı merkezi analitiğinden sosyal medya içerik üretimine kadar pek çok senaryoya kolayca entegre edilebiliyor. 🌐 Kumru’yu keşfedin: kumru.ai 📄 Teknik detaylar: medium.com/vngrs/kumru-llm-3… ve huggingface.co/vngrs-ai/Kumr… 📩 Kurum içi dağıtım, özel entegrasyonlar veya fine-tuning ihtiyaçları için bizimle iletişime geçebilirsiniz: info@vngrs.com Kumru ile Türkçede yapay zekâ artık daha akıllı, daha hızlı, daha güçlü. @denizoktar @aydinhan @meliksah_turker #Kumru #LLM #TürkçeLLM #YapayZeka #VNGRS
10
locked in and ready to write a bunch of code
5