CORPUS

THE SEMANTIC MAP OF GPT-2

This map visualises how GPT-2 — a 124M-parameter language model — represents its vocabulary. Every word is stored as a vector of 768 numbers, and words used in similar contexts end up pointing in similar directions. This is what that structure looks like.

—

Words

768

Dimensions

—

Clusters

Neighbours / Word

The raw data
The starting point is a single matrix of 50,257 × 768 floating-point numbers — the wte weight tensor from GPT-2's open weights. Here is what a single word looks like:

Embedding vector for "happy" — 24 of 768 dimensions

We filter to ~30,000 alphabetic English words, compute cosine similarity between every pair in the full 768-dimensional space, and connect each word to its 20 nearest neighbours.

Projection
UMAP (n_neighbors=15, min_dist=0.1, cosine metric) compresses 768 dimensions into 2D and 3D. Nearby points are likely close in the original space, but the reverse isn't guaranteed — some truly similar words get separated by the projection. t-SNE (perplexity=30) offers a second projection to compare.

Clusters
The Louvain algorithm detects communities in the high-dimensional neighbour graph — not in the 2D layout. This is why you sometimes see mixed colours in a region: the cluster structure reflects the true 768D geometry, while the map is an approximation.

Interacting
Hover a word to see its neighbours and edges. Click or search to lock the selection. Click a cluster label to highlight its members. Switch to Explore for a searchlight that reveals what's in each region. Toggle 3D to see the UMAP 3D projection. Press Escape or × to clear.

Built with D3.js and Three.js. Graph analysis via NetworkX. No human labelled any of the clusters or positions — everything emerges from the embedding matrix.

Model weights ↗ · Source code ↗