This map visualises how GPT-2 β a 124M-parameter language model β represents its vocabulary. Every word is stored as a vector of 768 numbers, and words used in similar contexts end up pointing in similar directions. This is what that structure looks like.
The raw data
The starting point is a single matrix of 50,257 × 768 floating-point numbers β the wte weight tensor from GPT-2's open weights. Here is what a single word looks like:
We filter to ~30,000 alphabetic English words, compute cosine similarity between every pair in the full 768-dimensional space, and connect each word to its 20 nearest neighbours.
Projection
UMAP (n_neighbors=15, min_dist=0.1, cosine metric) compresses 768 dimensions into 2D and 3D. Nearby points are likely close in the original space, but the reverse isn't guaranteed β some truly similar words get separated by the projection. t-SNE (perplexity=30) offers a second projection to compare.
Clusters
The Louvain algorithm detects communities in the high-dimensional neighbour graph β not in the 2D layout. This is why you sometimes see mixed colours in a region: the cluster structure reflects the true 768D geometry, while the map is an approximation.
Interacting
Hover a word to see its neighbours and edges. Click or search to lock the selection. Click a cluster label to highlight its members. Switch to Explore for a searchlight that reveals what's in each region. Toggle 3D to see the UMAP 3D projection. Press Escape or × to clear.
Built with D3.js and Three.js. Graph analysis via NetworkX. No human labelled any of the clusters or positions β everything emerges from the embedding matrix.