Yik Siu Chan · Aug 21, 2025 · 5:44 PM UTC

Yik Siu Chan

Yik Siu Chan @yiksiux

Aug 21

This is such a cool paper! “No computation without abstraction”

Aryaman Arora

@aryaman2020

Aug 20

my good friend Atticus Geiger has written an interesting new paper on causal abstraction <=> philosophy of computation! since he has much better things to do than tweet, i'm posting his paper for the world

Amir Zur · Aug 6, 2025 · 9:30 PM UTC

Yik Siu Chan retweeted

Amir Zur @AmirZur2000

Aug 6

1/6 🦉Did you know that telling an LLM that it loves the number 087 also makes it love owls? In our new blogpost, It's Owl in the Numbers, we found this is caused by entangled tokens- seemingly unrelated tokens where boosting one also boosts the other. owls.baulab.info/

It's Owl in the Numbers: Token Entanglement in Subliminal Learning

Entangled tokens help explain subliminal learning.

owls.baulab.info

665

Ryan Liu · Jul 24, 2025 · 12:58 AM UTC

Yik Siu Chan retweeted

Ryan Liu

@theryanliu

Jul 24

A short 📹 explainer video on how LLMs can overthink in humanlike ways 😲! had a blast presenting this at #icml2025 🥳

Aryaman Arora · Jul 19, 2025 · 11:13 PM UTC

Yik Siu Chan retweeted

Aryaman Arora

@aryaman2020

Jul 19

maybe I will live tweet the actionable interp workshop panel

100

Yong Zheng-Xin (Yong) · Jun 20, 2025 · 2:14 PM UTC

Yik Siu Chan retweeted

Yong Zheng-Xin (Yong)

@yong_zhengxin

Jun 20

We see so many work this week about "emergent misalignment", but how is it fundamentally different from LLM jailbreaking research? I wrote a short blog post about it: yongzx.substack.com/p/emerge…

Narutatsu (Edward) Ri · Jun 18, 2025 · 10:58 PM UTC

Yik Siu Chan retweeted

Narutatsu (Edward) Ri @narutatsuri

Jun 18

【#ICML2025 Poster】 [1/7] Many works develop intricate “jailbreaks” that elicit harmful outputs from LLMs. But can more common user-LLM interactions cause the same? We show yes! Paper: arxiv.org/abs/2502.04322 Coauthors: @yiksiux, @YuxinXiao6, @MarzyehGhassemi

Aaron Mueller · Apr 23, 2025 · 5:57 PM UTC

Yik Siu Chan retweeted

Aaron Mueller @amuuueller

Apr 23

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work? We propose 😎 𝗠𝗜𝗕: a Mechanistic Interpretability Benchmark!

172

Yik Siu Chan · Apr 17, 2025 · 4:32 PM UTC

Yik Siu Chan @yiksiux

Apr 17

Thank you for featuring our work!!

MIT Jameel Clinic for AI & Health @AIHealthMIT

Apr 16

🚨 New study @MIT, Brown & Columbia shows how AI models can be jailbroken to give dangerous responses—like how to commit tax fraud. Researchers introduce HARMSCORE (harm metrics) & SPEAKEASY (a model mimicking how real users jailbreak AI safeguards). 📄: arxiv.org/pdf/2502.04322

Yik Siu Chan · Dec 11, 2024 · 7:22 PM UTC

Yik Siu Chan @yiksiux

11 Dec 2024

I’m grateful to have been part of this collaboration on LLMs for health with the amazing team at MIT. Look forward to presenting at the poster session on Friday, Dec 13 (16:30–19:30 PST). Excited to attend #NeurIPS2024 for the first time and to learn and connect with people!

Yubin Kim @ybkim95_ai

4 Dec 2024

I will be at #NeurIPS2024 from December 10-16. Thrilled to present our oral paper(MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making) on Friday, December 13th (15:50-16:10 PST). 🔍 Learn more: Project page: lnkd.in/e67E7iPA