Nidal · Apr 3, 2024 · 4:19 AM UTC

Nidal

Pinned Tweet

Nidal

@imleslahdin

3 Apr 2024

Old geography book I have. Can you tell which year it's from? (clues in the photos).

gabe · Nov 8, 2025 · 3:21 AM UTC

Nidal retweeted

gabe

@allgarbled

22h

1,044

Goodfire · Nov 6, 2025 · 6:06 PM UTC

Nidal retweeted

Goodfire

@GoodfireAI

Nov 6

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? @jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

124

769

Linoy Tsaban🎗️ · Nov 6, 2025 · 3:31 PM UTC

Nidal retweeted

Linoy Tsaban🎗️

@linoy_tsaban

Nov 6

turns out it works with paintings too 🤯

Linoy Tsaban🎗️

@linoy_tsaban

Nov 5

adding camera control to the list of things Qwen Image Edit is great at + with a specialized multi-angle LoRA it's even better✨ > rotate the camera > tilt between bird’s-eye and worm’s-eye views > adjust lens (wide, close-up) of course we built a demo for it 🤝📹

123

1,341

Google DeepMind · Nov 6, 2025 · 3:39 PM UTC

Nidal retweeted

Google DeepMind

@GoogleDeepMind

Nov 6

We built a web app that lets you fly a spaceship through a 3D constellation of music - powered by our Lyria RealTime model. 🎶 Space DJ is an interactive visualization where every star represents a different music genre. As you explore, your path is translated into prompts for the API, creating a continuously evolving soundtrack. ↓

226

1,601

Bilawal Sidhu · Nov 7, 2025 · 7:23 PM UTC

Nidal retweeted

Bilawal Sidhu

@bilawalsidhu

Nov 7

I genuinely think we’re on the cusp of a new type of creation engine. Feels less like prompting and more like puppeteering reality itself. MotionStream is a taste of what’s to come:

215

2,105

Brent 📍SF · Nov 5, 2025 · 10:43 PM UTC

Nidal retweeted

Brent 📍SF

@BingBongBrent

Nov 5

Today I’m launching Swipe -- a new way to steer image models. The idea is simple – start with a prompt, swipe left / right to steer the model towards what you’re thinking of.

234

123

4,101

Nidal · Nov 5, 2025 · 6:28 PM UTC

Nidal

@imleslahdin

Nov 5

this has been observed multiple times (without much response from Anthropic) and without an open source claude it's harder to investigate the cause.

˚♡⋆mimi ˚♡⋆｡☆∴

@mimi10v3

Oct 4

Chatting w Sonnet 4.5, it used a word of Chinese hanzi out of nowhere, with zero mention of Chinese in context. When i asked about it, it was confused and said the chat context it sees shows the word in English instead. deepseek behavior from a Claude model, bizarre

Nidal · Nov 5, 2025 · 6:22 PM UTC

Nidal

@imleslahdin

Nov 5

many such cases

Allen

@MisalignedModel

Nov 4

Claude outputting Chinese tokens for no reason, first time happening to me

Nidal · Nov 5, 2025 · 6:21 PM UTC

Nidal

@imleslahdin

Nov 5

Claude's random chinese character insertions:

Nidal · Nov 5, 2025 · 3:09 AM UTC

Nidal

@imleslahdin

Nov 5

One reason this sounds more profound than it should is, it's named after people that you probably never heard of... instead of "hyper-complex extension" (which would sound familiar if you studied complex numbers) or "Bilinear conjugate-pair construction" ... Math really does need a refactoring.

tomie

@tomieinlove

Nov 4

The Cayley-Dickinson construction lets you turbocharge numbers by making them more quirked up. Applying it to the real numbers gives you the complex numbers. Applying it again gives you the quaternions. Then the octonions. Then the sedenions. Then the trigintaduonions. Then the

Nidal · Nov 4, 2025 · 10:45 PM UTC

Nidal

@imleslahdin

Nov 4

This is the best rotation to appreciate California's geography.

Evan Applegate

@youwillmakemaps

Nov 3

Sometimes you gotta rotate it

Jay Alammar · Nov 3, 2025 · 3:43 PM UTC

Nidal retweeted

Jay Alammar

@JayAlammar

Nov 3

The Illustrated NeurIPS 2025: A Visual Map of the AI Frontier New blog post! NeurIPS 2025 papers are out—and it’s a lot to take in. This visualization lets you explore the entire research landscape interactively, with clusters, summaries, and @cohere LLM-generated explanations that make the field easier to grasp. Link in thread!

195

1,208

Sebastian Raschka · Nov 1, 2025 · 1:42 PM UTC

Nidal retweeted

Sebastian Raschka

@rasbt

Nov 1

With the release of the Kimi Linear LLM last week, we can definitely see that efficient, linear attention variants have seen a resurgence in recent months. Here's a brief summary of what happened. First, linear attention variants have been around for a long time, and I remember seeing tons of papers in the 2020s. I don't want to dwell too long on these older attempts. But the bottom line was that they reduced both time and memory complexity from O(n^2) to O(n) to making attention much more efficient for long sequences. However, they never really gained traction as they degraded the model accuracy, and I have never really seen one of these variants applied in an open-weight state-of-the-art LLM. In the second half of this year, there was a bit of a revival of linear attention variants. The first notable model was MiniMax-M1 with lightning attention, a 456B parameter mixture-of-experts (MoE) model with 46B active parameters, which came out back in June. Then, in August, the Qwen3 team followed up with Qwen3-Next, which I discussed in more detail above. Then, in September, the DeepSeek Team announced DeepSeek V3.2 with sparse attention. All three models (MiniMax-M1, Qwen3-Next, DeepSeek V3.2) replace the traditional quadratic attention variants in most or all of their layers with efficient linear variants. (DeepSeek's sparse attention it's not strictly linear but still subquadratic). Interestingly, there was a recent plot twist, where the MiniMax team released their new 230B parameter M2 model (discussed in section 13) without linear attention, going back to regular attention. The team stated that linear attention is tricky in production LLMs. It seemed to work fine with regular prompts, but it had pure accuracy in reasoning and multi-turn tasks, which are not only important for regular chat sessions but also agentic applications. This could have been a turning point where linear attention may not be worth pursuing after all. However, it gets more interesting. Last week, the Kimi team released their new Kimi Linear model with linear attention. The tag line is that compared to regular, full attention, it has a 75% KV cache reduction and up to 6x decoding throughput. Kimi Linear shares several structural similarities with Qwen3-Next. Both models rely on a hybrid attention strategy. Concretely, they combine lightweight linear attention with heavier full attention layers. Specifically, both use a 3:1 ratio, meaning for every three transformer blocks employing the linear Gated DeltaNet variant, there's one block that uses full attention as shown in the figure below. However, Kimi Linear modifies the linear attention mechanism of Qwen3-Next by the Kimi Delta Attention (KDA) mechanism, which is essentially a refinement of Gated DeltaNet. Interestingly, it also replaces the standard full attention module by multi-head latent attention. There's no direct comparison to Qwen3-Next in the Kimi Linear paper, but compared to the Gated DeltaNet-H1 model from the Gated DeltaNet paper (which is essentially Gated DeltaNet with sliding-window attention), Kimi Linear achieves higher modeling accuracy while maintaining the same token-generation speed. Of course, I couldn't resist and added it to my The Big LLM Architecture Comparison article, which has grown to >10,000 words now (basically becoming book!?).

222

1,288

Physical Review E · Nov 3, 2025 · 4:31 PM UTC

Nidal retweeted

Physical Review E @PhysRevE

Nov 3

Researchers analyzed forward signal and gradient back propagation in deep, randomly-initialized transformers and proposed simple, geometrically-meaningful criteria for hyperparameter initialization that ensure the trainability of deep transformers. 🔗 go.aps.org/3WVMa4R

Stephen Wolfram · Nov 3, 2025 · 3:32 PM UTC

Nidal retweeted

Stephen Wolfram

@stephen_wolfram

Nov 3

Working on what seems like an important idea in the foundations of biology ... but along the way ran across these curious creatures:

103

1,330

Nidal · Nov 2, 2025 · 8:19 PM UTC

Nidal

@imleslahdin

Nov 2

facepalm

News from Google

@NewsFromGoogle

Nov 1

Replying to @NewsFromGoogle

Gemma is available via an API and was also available via AI Studio, which is a developer tool (in fact to use it you need to attest you're a developer). We’ve now seen reports of non-developers trying to use Gemma in AI Studio and ask it factual questions. We never intended this to be a consumer tool or model, or to be used this way. To prevent this confusion, access to Gemma is no longer available on AI Studio. It is still available to developers through the API.

Timothy Gowers @wtgowers · Oct 31, 2025 · 7:22 PM UTC

Nidal retweeted

Timothy Gowers @wtgowers @wtgowers

Oct 31

I crossed an interesting threshold yesterday, which I think many other mathematicians have been crossing recently as well. In the middle of trying to prove a result, I identified a statement that looked true and that would, if true, be useful to me. 1/3

305

2,518

Nidal · Nov 1, 2025 · 4:26 PM UTC

Nidal

@imleslahdin

Nov 1

interesting

LaurieWired

@lauriewired

Oct 31

The biggest predictor of coding ability is Language Aptitude. Not Math. A study posted in Nature found that numeracy accounts for just 2% of skill variance. Meanwhile, the neural behaviors associated with language accounted for 70% of skill variance.

Nidal · Nov 1, 2025 · 12:03 AM UTC

Nidal

@imleslahdin

Nov 1

SimIsle 1995