recently partnered with @GergelyOrosz to write "What is good software architecture?" for The Pragmatic Engineer:
newsletter.pragmaticengineer…
the core thesis is that good architecture work involves upgrading your problems
it’s tokenization again! 🤯
did you know tokenize(detokenize(token_ids)) ≠ token_ids?
RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated.
why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary.
→ unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 piped.video/watch?v=zduSFxRa… )
together with the Agent Lightning team at Microsoft Research, we’ve fixed it:
vLLM’s OpenAI-compatible endpoints can return token IDs directly.
just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs.
no more drift. no more mismatch. your agent RL now trains exactly on what it sampled.
read more from the blog 👇
👉 blog.vllm.ai/2025/10/22/agen…#vLLM#AgentLightning#RL#LLMs#OpenAIAPI#ReinforcementLearning
The *typeagent* project implements long-term memory for agents that's better than RAG. We extract "knowledge" using an LLM which gives better precision/recall. Find code and presentation at github.com/microsoft/typeage…