it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 piped.video/watch?v=zduSFxRa… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/agen… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning

Oct 22, 2025 · 3:17 PM UTC

credit to @lllxf and @ultmaster3 from Agent Lightning ♥️
2
7
Replying to @vllm_project
Such a practical fix. Curious could this also reduce hallucination risk in tool-use scenarios where JSON formatting inconsistencies trip up downstream systems?
1
1
feel free to explore and show your findings!
3
Replying to @vllm_project
Can we provide tokens in as well?
1
1
yes it's supported, you can pass in token ids directly in v1/completions endpoint.
2
Replying to @vllm_project
You can tokenize any AI model at Modelz.io Check it out!
The FreeForm Playground is live! We’re teaming up with @printablescom to challenge makers worldwide! Take our modular hardware and make it your own. Design, print, and share your FreeForm creation for a chance to win a Prusa MK4S Kit, a Cooler Master Custom PC Build, and more. Don’t wait! Submit your entry before the deadline on December 1, 2025. Your build, your rules. 🔗 linkto.cm/FB__FF_Playground
1
14
83
Replying to @vllm_project
This is huge for RL researchers. Retokenisation drift has been messing with training for a while. What your model generates isn't always what the trainer sees. Now with vLLM returning IDs directly, your RL updates match exactly what the model sampled. No more chaos
3
Replying to @vllm_project
all details about *bpe*, not about tokenization. bpe is just one algorithm and it was developed with zero motivation from language.
1
3
Replying to @vllm_project
Yes, took a while to identify the issue during one of my RL projects. Can easily be overlooked!
1
Replying to @vllm_project
Yes.
huh? tokenizer.decode([389]) + tokenizer.decode([6376]) == tokenizer.decode([389, 6376]) # True ✅ tokenizer.decode([364]) + tokenizer.decode([6389]) == tokenizer.decode([364, 6389]) # False ❌ “1 + 1 + 1 = 3?”: Depends on the tokens, bro. 😂
1
Replying to @vllm_project
Retokenization drift sounds like a subtle but important problem.
1
Replying to @vllm_project
whoa tokenization quirks? that's some crazy ai chaos builders gotta wrestle with daily
Replying to @vllm_project
Seems the “HAVING” tokenization issue will mess with prefix caching too?
Replying to @vllm_project
Can you give an example where tokenize(detokenize(token_ids)) ≠ token_ids?
Replying to @vllm_project
Same goes for “v1/responses”?
Replying to @vllm_project
We call it the banana problem.
Replying to @vllm_project
This explains so many weird edge cases in model training