Wow, language models can talk without words.
A new framework, Cache-to-Cache (C2C), lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation.
It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange.
The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses.
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Code: github.com/thu-nics/C2C
Project: github.com/thu-nics
Paper: arxiv.org/abs/2510.03215
Our report: mp.weixin.qq.com/s/tjDq99VrE…
📬 #PapersAccepted by Jiqizhixin
wild to think LLMs are out here telepathically gossiping while we still type like cavemen
Nov 4, 2025 · 9:40 AM UTC



