LENS — Interactive Token Visualizer

LENS isolates individual token contributions in a text-to-image model by running the diffusion model with each token's embedding kept while all others are replaced by padding. This reveals which tokens drive which visual concepts in the generated image.

Running on flux-schnell with the T5 (text encoder).

📄 Paper: Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models (arXiv 2504.01137)

Tokenization

4 tokens  |  model: flux-schnell  |  tokenizer: T5 (text encoder)

#0 pe [158]#1 lic [2176]#2 an [152]#3 </s> [1]

Generation

1 4
Example: 'pelican' with Flux Schnell + T5
Prompt Images per prompt Full prompt Per token

First generation loads model weights (~1 min). Subsequent runs are fast.