julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector

julius tarng cyber inspector @tarngerine

Oct 24

What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:

750

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

This understanding is context dependent – the eyes feature light up as soon as you give enough characters to form the top of the head! Only a single _ is needed, as well as a / \ forehead for the 2nd @ to activate the "eyes" feature above.

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

It gets richer as the models improve. Compared to Haiku 3.5, we find a trove of interesting features in the Sonnet 4.5 base model. From various animal body parts, to “motor” neuron features like “say smile” that activate ahead of the ASCII mouth, to features that perceive “size”!

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

And yes, we checked that this works for human-made SVGs too! I drew this (imo cuter) dog and found many of the same features as on the Claude-generated dog above (which honestly looks more like a bear!)

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

The coolest part is that just like Golden Gate Claude, we can steer with many of these features, transforming a smiley face to wrinkly face, an owl, or an eyeball!

106

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

And these steering gradients are meaningful in transition as well! See how this smile transforms to a :| face before becoming a frown as we steer negatively on the “say smile” motor feature we found above.

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

Read the full writeup: transformer-circuits.pub/202… While it’s not a full paper, I’m proud to have been a major contributor. It’s a personal milestone from a non-academic bg, to investigate how LLMs visually reason with talented researchers

Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector · Oct 24, 2025 · 9:28 PM UTC

julius tarng cyber inspector @tarngerine

Oct 24

thanks to @purvigoel3, @ikauvar, @thebasepoint, @adamsjermyn for all your work and support on getting this out the door. i learned so much and it's changed how i think about research rigor and communication

julius tarng cyber inspector · Oct 25, 2025 · 12:33 AM UTC

julius tarng cyber inspector @tarngerine

Oct 25

bonus easter egg: as we were finalizing the draft, @ikauvar suggested we analyze a "pelican riding a bicycle" SVG (per @simonw 's infamous eval) for interesting features, and found some of neat features, like "fur/wool" activating on the wings!