This paper introduces FlexiTokens, a language model that learns its own boundaries and shifts them during finetuning.
Subword tokenizers break when text looks different, so models waste compute on endless tiny pieces.
FlexiTokens works at byte level, runs a lightweight transformer that marks possible split points, then pools bytes into variable segments before the usual layers.
Instead of forcing a fixed compression ratio, the authors add a hinge style loss that only cares if a sequence gets too many splits, leaving extra freedom in the other direction.
During adaptation the loss lets the boundary predictor loosen or tighten, so medical notes, Turkish verbs, or code get the chunk sizes they deserve.
Across 6 languages and 7 tasks the model cuts token counts by up to 2x and still lifts accuracy by about 10%.
A 1B parameter version even beats a larger static BPE setup while staying faster because input gets shorter.
The same model handles unseen Urdu script without retraining a tokenizer, showing the approach is truly language agnostic.
----
Paper – arxiv. org/abs/2507.12720
Paper Title: "FLEXITOKENS: Flexible Tokenization for Evolving Language Models"