Pinned Tweet
Accept responsibility for your own actions.No excuses, no regrets, no alibis, don't point the finger, don't blame anybody else.-Tony Doherty
6
Lx retweeted
it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 piped.video/watch?v=zduSFxRa… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/agen… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning
The *typeagent* project implements long-term memory for agents that's better than RAG. We extract "knowledge" using an LLM which gives better precision/recall. Find code and presentation at github.com/microsoft/typeage…