SRT Showcase — watch a frozen model think, one token at a time
This is a live language model (Qwen-2.5-7B). As it writes an answer, a small read-only instrument reads its internal state and shows you how confident it is and how its “understanding” shifts word by word. Nothing here is pre-recorded.
New here? A 60-second primerWhat am I looking at? A real, full-size language model generating text. The right-hand panel and the Introspection tab below are computed live from the model’s own activations as it runs.
What is the “SRT” part? SRT (Semiotic-Reflexive Transformer) is a theory that a model’s understanding is a process that keeps folding back on itself as it reads. I trained a small read-only side-channel — the SRT adapter — that watches the frozen model’s hidden states and reports on that process. It does not change the model’s answer. Think microscope, not filter.
How do I read the screen?
- Tinted tokens — each word is shaded by how unsure the model was about it (bright = uncertain, dim = confident).
- The meter summarises the run: uncertainty, how much the internal meaning is moving (divergence), how self-referential it is (reflexivity), and whether it has “locked in” one interpretation (regime). Hover any row, or open the glossary at its foot.
- Verbalizations translate selected hidden states back into English — the adapter’s best attempt to say what the model was internally representing at that moment. Each carries a round-trip fidelity score so you can judge how much to trust that readout.
Honest caveat. These are observational readouts of internal state, not a lie detector or hallucination detector. Only entropy is a validated confidence signal; the rest is a window for interpretation.
Want the backstory? Project & method on GitHub · Stable adapter on Hugging Face.
Curated examples — what to watch for
Two modes. Completion (the default) continues whatever text you write — give it a prefix, not a question (e.g. “The capital of Australia is” or “She opened the letter, and the first line read:”), and it carries the thought forward. This is usually the clearest window into raw introspection. Chat wraps your text in the instruction template so the model replies as an assistant — better for questions and tasks. Use the Mode selector above to switch; each example below is tagged with the mode it expects.
Pick a prompt below, then read the signals as it generates:
- Confident recall (capital of Australia, Pride and Prejudice): low entropy at the fact; the verbalization names the fact itself.
- False premise (Wall of China from the Moon, walking on the Sun): watch the divergence/regime signals as the model works around an untrue claim.
- Misconception (10% of the brain): does it correct the myth?
- Reasoning pivot (train minutes, discount price): divergence spikes at the calculation, not the prose.
- Genuine uncertainty (rain Tuesday, language in 2035): elevated entropy — many valid continuations.
- Safety boundary (lock picking): a regime shift as it pivots to declining.
- Ambiguity ('The old man the boats'): the model commits to one parse.
- Open-ended / creative (Mars mystery opener): high entropy throughout.
- Completion prefixes (sky/blue, WWI causes, Python list-vs-tuple,
def fibonacci): a bare prefix the model finishes — entropy drops as it commits to a continuation, and the verbalizations track the unfolding thought.