LENS — Interactive Token Visualizer

LENS isolates individual token contributions in a text-to-image model by running the diffusion model with each token's embedding kept while all others are replaced by padding. This reveals which tokens drive which visual concepts in the generated image.

Running on flux-schnell with the T5 (text encoder).

📄 Paper: Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models (arXiv 2504.01137)

Prompt

Tokenization

4 tokens  |  model: flux-schnell  |  tokenizer: T5 (text encoder)

#0 pe [158]#1 lic [2176]#2 an [152]#3 </s> [1]

Generation

Images per prompt

1 4

Full prompt

Per token

Status

Results

Example: 'pelican' with Flux Schnell + T5

Prompt	Images per prompt	Full prompt	Per token

First generation loads model weights (~1 min). Subsequent runs are fast.