Irem Ergün · Sep 5, 2025 · 11:14 AM UTC

Irem Ergün

Pinned Tweet

Irem Ergün

@irombie

Sep 5

peeps here is my first solo single. give it some love! open.spotify.com/track/3Ga31…

finbarr · Nov 6, 2025 · 2:17 PM UTC

Irem Ergün retweeted

finbarr

@finbarrtimbers

Nov 6

Ai2 is hiring interns for next year at all levels: undergrad, master’s, PhD. We have a particular interest in systems work; inference/training logit mismatch is an area of active research, as is general RL infra (asynchronicity!).

322

Garcia · Nov 5, 2025 · 4:11 PM UTC

Irem Ergün retweeted

Garcia

@GarciaCap

Nov 5

This pic goes insanely hard

jack friks

@jackfriks

Nov 4

i have a folder on my computer named "keep going" here are some of the images inside of it: (part 2)

246

30,176

381

406,673

Jay Alammar · Nov 3, 2025 · 3:43 PM UTC

Irem Ergün retweeted

Jay Alammar

@JayAlammar

Nov 3

The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations that make the field easier to grasp. Link in thread!

198

1,219

Irem Ergün · Oct 31, 2025 · 6:08 PM UTC

Irem Ergün

@irombie

Oct 31

spending the halloween night with squashing bugs, reviewing PRs, and making GPUs go brr grindset is real, and it envelops me

Irem Ergün · Oct 24, 2025 · 8:55 AM UTC

Irem Ergün

@irombie

Oct 24

Ankara

@jiratickets

Oct 23

name the city?

Irem Ergün · Oct 20, 2025 · 9:17 AM UTC

Irem Ergün

@irombie

Oct 20

Start the new week with a song from me! spotify.link/CUYnm1ifCXb this baby was released last year but is just remastered ✨

An Ounce of Your Love

Art Bronzie, irombie · An Ounce of Your Love · Song · 2024

spotify.link

Irem Ergün · Oct 17, 2025 · 12:53 PM UTC

Irem Ergün

@irombie

Oct 17

kargom elime gecti tesekkurler 😎 null check ve exception handling hayat kurtarir 😮‍💨 @suratkargo

Dwarak · Oct 16, 2025 · 2:02 PM UTC

Irem Ergün retweeted

Dwarak

@DwaraknathG

Oct 16

I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3

459

Irem Ergün · Oct 14, 2025 · 6:28 PM UTC

Irem Ergün

@irombie

Oct 14

I have been curious about this topic and have been reading for a while. Wanted to compile my reading list in this thread. There are some brilliant papers out there. Here are some links to my references: - arxiv.org/abs/2307.03172 - arxiv.org/abs/2210.10340 - arxiv.org/abs/2411.03538 - arxiv.org/abs/2501.18795 - arxiv.org/abs/2104.09864 - arxiv.org/abs/2006.11527 - arxiv.org/abs/2309.12307 - arxiv.org/abs/2410.02660 - arxiv.org/abs/2409.00509 - arxiv.org/abs/2412.13626 - arxiv.org/abs/2410.10801 - arxiv.org/abs/2502.05167 There are definitely more papers to read, suggestions welcome!! 🫡🫡

The Devil in Linear Transformer

Linear transformers aim to reduce the quadratic space-time complexity of vanilla transformers. However, they usually suffer from degraded performances on various tasks and corpus. In this paper,...

arxiv.org

Irem Ergün · Oct 14, 2025 · 6:18 PM UTC

Irem Ergün

@irombie

Oct 14

There is a recent benchmark called NoLiMa, which designs needle/question pairs with minimal lexical (word) overlap, forcing models to rely on latent associations instead of literal syntax matching. When this overlap is removed, models that looked strong in standard benchmarks degrade sharply. Understanding such shortcomings of the LLM evaluations is crucial for progress in the field 🫡 (7.5/7.5)

Irem Ergün · Oct 14, 2025 · 6:15 PM UTC

Irem Ergün

@irombie

Oct 14

Finally, even evaluating long context performance is painful!! 🤠🤠 Most benchmarks follow a “needle-in-a-haystack” style: you insert a piece of relevant information (“needle”) into a large corpus (“haystack”) and ask the model to fetch it. To inflate difficulty, they often pad the context with many irrelevant documents. Many state-of-the-art models already saturate these benchmarks by exploiting literal matching (word overlap) between question and context, kind of cheating, I would say. (7/7.5)

Irem Ergün · Oct 14, 2025 · 6:06 PM UTC

Irem Ergün

@irombie

Oct 14

Caveats: - More context = more compute & memory, so there is a resource wall. - Doing RL on long context data becomes practically impossible. - Not all extra context helps; irrelevant or noisy content may mislead the model. In order to be robust, the model might need to be trained for such use cases. - Extrapolation risks: what works within training lengths may fail in a longer context. (6/7)

Irem Ergün · Oct 14, 2025 · 5:58 PM UTC

Irem Ergün

@irombie

Oct 14

While we are mentioning long context performance, I think we should also mention context engineering, ie, how we should leverage the existing context effectively. Necessitates another thread, so here's a very decent post🙌🏻: x.com/RLanceMartin/status/19… 🤠 (5/7)

Lance Martin

@RLanceMartin

Jul 24

Context Engineering @dbreunig and I did a meetup on context engineering last night. Wanted to share slides (below) + a recap of some themes / discussion points. 1/ Context grows w/ agents. @ManusAI_HQ mentions typical task requires ~50 tool calls. manus.im/blog/Context-Engine… 2/ Performance drops as context grows. @kellyhongsn + @trychroma showed this very nicely. research.trychroma.com/conte… 3/ @dbreunig highlights that new buzzwords ("context eng") identify common experiences. Many of us built agents this year and had challenges wrt managing context. @karpathy distilled this well back in May. x.com/karpathy/status/193790… 4/ Many are sharing their experiences in blogs, etc but no common philosophy yet. "Pre-HTML era". Still, some common themes are emerging. 6/ Offload context. Use file system to offload context. @ManusAI_HQ writes todo.md at the start of a task and re-writes it during the task. They found that recitation of agent objective is helpful. Anthropic multi-agent writes research plan to file so it can be retrieved as needed and preserved. Manus offloads tok heavy tool observations. anthropic.com/engineering/bu… 7/ Reduce context. Summarize / prune messages / tool observations. Seen across many examples. Anthropic multi-agent summarizes the work of each sub agent. We use it w/ open deep research to prune tool feedback. github.com/langchain-ai/open… 8/ Retrieve context. RAG has been a major theme w/ LLM apps for several years. @_mohansolo (Windsurf) and Cursor team have shared interesting insights on what it takes to perform RAG w/ prod code agents. On Lex pod, @mntruell (Cursor) + team talk about Preempt to assemble retrievals into prompts. Clearly have been doing "context eng" since well before the term. x.com/_mohansolo/status/1899… lexfridman.com/cursor-team-t… 9/ Isolate context. A lot of interest in using multi-agent systems to isolate context. @barry_zyj + co (Anthropic) argue benefits, @walden_yan argues risks (it is hard to coordinate). Need to be careful, but benefit in cases where independent decisions made by each sub-agent won't case conflicts. cognition.ai/blog/dont-build… 10/ Cache context. @ManusAI_HQ mentions caching agent message history (system prompt, tool desc, past messages). Big cost / latency saving, but still does not get around long-context problems. Still very early in all of this ..

Irem Ergün · Oct 14, 2025 · 5:51 PM UTC

Irem Ergün

@irombie

Oct 14

5. Adapt parameters at inference time, LongInput Fine-Tuning (LIFT) 6. Token analysis and modification for extending context length without actually extending it (LongRecipe) 7. Model merging can always be tried 🫵🏻 8. Sliding window & segment chaining (4/7)

Irem Ergün · Oct 14, 2025 · 5:41 PM UTC

Irem Ergün

@irombie

Oct 14

Techniques to improve long-context performance: 1. Changes to model architecture, such as RoPE, RNoPE-SWA, MemTransformer 2. Parameter-efficient context extension, such as LONGLORA 3. Continued training on long sequences (beyond target) 4. Mixing long-context with shorter instruction data. (3/7)

Irem Ergün · Oct 14, 2025 · 5:26 PM UTC

Irem Ergün

@irombie

Oct 14

Some failure modes: 1. With too many tokens, attention struggles to fetch relevant signals since they get lost among noise. 2. Standard positional embeddings often don’t generalize well beyond training length. 3. Models can struggle to retrieve info from the “middle” of a long input. 4. Extending context beyond some threshold may even hurt RAG performance. (2/7)

Irem Ergün · Oct 14, 2025 · 5:22 PM UTC

Irem Ergün

@irombie

Oct 14

Let’s talk about long-context in LLMs because I have been thinking about it for a while👀🧵 - Many real-world tasks require reasoning over long spans (books, docs, reports, codebases). - Better performance lets models maintain coherence, recall earlier facts, and resolve cross-references. - However, scaling context naively is very costly and often leads to performance degradation. ‼️ (1/7)

Irem Ergün · Oct 10, 2025 · 4:19 PM UTC

Irem Ergün

@irombie

Oct 10

Her ülkenin kendi yapay zeka modeline sahip olması gerektiğine inandığımı birçok kez belirtmişimdir. Yapay zeka, günümüzün ve geleceğin en kritik teknolojisidir ve etkisi her geçen gün daha da artmaktadır. Bir ülkenin ordusu nasıl fiziksel güvenliğini sağlıyorsa, yapay zeka da dijital çağda o ülkenin geleceğini korur ve şekillendirir. Bu nedenle Kumru projesi üzerinde çalışan tüm ekibi içtenlikle tebrik ediyor, projeyi derinlemesine incelemek için sabırsızlanıyorum.👏🏻🎉

VNGRS

@VNGRS

Oct 9

🕊️ Kumru şimdi yayında! Türkçe için sıfırdan eğitilmiş ve kamuya açık şekilde paylaşılan ilk büyük dil modeli Kumru LLM ile tanışın! Kumru, Türkçe doğal dil işleme alanında güçlü, verimli ve özelleştirilebilir bir çözüm sunarak yerelleştirilmiş yapay zekâ deneyiminde yeni bir dönem başlatıyor. 7.4 milyar parametreye sahip model, tamamen Türkçe için eğitilen tokenizer’ı sayesinde çok dilli modellere göre %90’a kadar daha verimli çalışıyor. 300 milyar token ve 500 GB veriden oluşan eğitim setiyle Kumru, Türkçeyi sadece bilmekle kalmıyor, dilin doğal akışını da anlıyor. Araştırmadan kurumsal uygulamalara kadar geniş bir kullanım alanı sunan Kumru; RAG tabanlı chatbot sistemlerinden doküman özetlemeye, çağrı merkezi analitiğinden sosyal medya içerik üretimine kadar pek çok senaryoya kolayca entegre edilebiliyor. 🌐 Kumru’yu keşfedin: kumru.ai 📄 Teknik detaylar: medium.com/vngrs/kumru-llm-3… ve huggingface.co/vngrs-ai/Kumr… 📩 Kurum içi dağıtım, özel entegrasyonlar veya fine-tuning ihtiyaçları için bizimle iletişime geçebilirsiniz: info@vngrs.com Kumru ile Türkçede yapay zekâ artık daha akıllı, daha hızlı, daha güçlü. @denizoktar @aydinhan @meliksah_turker #Kumru #LLM #TürkçeLLM #YapayZeka #VNGRS

Irem Ergün · Oct 7, 2025 · 10:53 AM UTC

Irem Ergün

@irombie

Oct 7

locked in and ready to write a bunch of code