vLLM · Oct 22, 2025 · 3:17 PM UTC

vLLM · Oct 22, 2025 · 3:17 PM UTC

vLLM

vLLM

@vllm_project

Oct 22

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most agents call LLMs via OpenAI-compatible APIs that only return strings, so when those strings get retokenized later, token splits may differ (HAV+ING vs H+AVING), tool-call JSON may be reformatted, or chat templates may vary. → unstable learning, off-policy updates, training chaos. 😬 (@karpathy has a great video explaining all details about tokenization 👉🏻 piped.video/watch?v=zduSFxRa… ) together with the Agent Lightning team at Microsoft Research, we’ve fixed it: vLLM’s OpenAI-compatible endpoints can return token IDs directly. just add "return_token_ids": true to your /v1/chat/completions or /v1/completions request, and you’ll get both prompt_token_ids and token_ids along with normal text outputs. no more drift. no more mismatch. your agent RL now trains exactly on what it sampled. read more from the blog 👇 👉 blog.vllm.ai/2025/10/22/agen… #vLLM #AgentLightning #RL #LLMs #OpenAIAPI #ReinforcementLearning

Let's build the GPT Tokenizer

The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings and tokens (text chunks). Tokenizer...

youtube.com

Oct 22, 2025 · 3:17 PM UTC

694

vLLM · Oct 22, 2025 · 3:42 PM UTC

vLLM

@vllm_project

Oct 22

credit to @lllxf and @ultmaster3 from Agent Lightning ♥️

Ansar Ullah Anas · Oct 22, 2025 · 5:09 PM UTC

Ansar Ullah Anas

@AnsarUllahAnas_

Oct 22

Replying to @vllm_project

Such a practical fix. Curious could this also reduce hallucination risk in tool-use scenarios where JSON formatting inconsistencies trip up downstream systems?

vLLM · Oct 22, 2025 · 5:11 PM UTC

vLLM

@vllm_project

Oct 22

feel free to explore and show your findings!

bdambrosio · Oct 22, 2025 · 4:15 PM UTC

bdambrosio

@bdambrosio

Oct 22

Replying to @vllm_project

Can we provide tokens in as well?

vLLM · Oct 22, 2025 · 5:02 PM UTC

vLLM

@vllm_project

Oct 22

yes it's supported, you can pass in token ids directly in v1/completions endpoint.

Alpaca Network · Oct 22, 2025 · 8:56 PM UTC

Alpaca Network

@AlpacaNetworkAI

Oct 22

Replying to @vllm_project

You can tokenize any AI model at Modelz.io Check it out!

Cooler Master · Nov 4, 2025 · 12:46 PM UTC

Cooler Master

@CoolerMaster

Nov 4

The FreeForm Playground is live! We’re teaming up with @printablescom to challenge makers worldwide! Take our modular hardware and make it your own. Design, print, and share your FreeForm creation for a chance to win a Prusa MK4S Kit, a Cooler Master Custom PC Build, and more. Don’t wait! Submit your entry before the deadline on December 1, 2025. Your build, your rules. 🔗 linkto.cm/FB__FF_Playground

olbowman · Oct 22, 2025 · 3:24 PM UTC

olbowman

@ol_bowman

Oct 22

Replying to @vllm_project

This is huge for RL researchers. Retokenisation drift has been messing with training for a while. What your model generates isn't always what the trainer sees. Now with vLLM returning IDs directly, your RL updates match exactly what the model sampled. No more chaos

Yuval Pinter · Oct 22, 2025 · 6:21 PM UTC

Yuval Pinter @yuvalpi

Oct 22

Replying to @vllm_project

all details about *bpe*, not about tokenization. bpe is just one algorithm and it was developed with zero motivation from language.

Aswin RRV · Oct 22, 2025 · 9:16 PM UTC

Aswin RRV

@aswinrrv

Oct 22

Replying to @vllm_project

Yes, took a while to identify the issue during one of my RL projects. Can easily be overlooked!

Daya · Oct 22, 2025 · 4:33 PM UTC

Daya @dskhudia

Oct 22

Replying to @vllm_project

Yes.

Daya @dskhudia

Jul 11

huh? tokenizer.decode([389]) + tokenizer.decode([6376]) == tokenizer.decode([389, 6376]) # True ✅ tokenizer.decode([364]) + tokenizer.decode([6389]) == tokenizer.decode([364, 6389]) # False ❌ “1 + 1 + 1 = 3?”: Depends on the tokens, bro. 😂