§11.3

Nonlinear Maps: t-SNE and UMAP

PCA gives a map whose axes a manager can read: value to premium, convenience to indulgence. For some datasets — high-dimensional customer behaviour, image embeddings, document representations — no linear axis carries that kind of meaning. The variation isn't captured by a small number of straight lines through the cloud; it lives on a curved manifold that PCA cannot recover. t-SNE and UMAP are the two methods that handle that case, at the cost of producing maps whose axes intentionally have no business meaning at all.

This is a short article. The visualizations are powerful enough that students see them everywhere; the caveats are powerful enough that a single article on how to read them safely is worth the space.

The Executive Question

When the structure in our data is too curved or too local for PCA to summarize, what tools let us see clusters of behaviour — and what is the price we pay for seeing them?

The price is that the picture stops being algebraic and becomes pictorial. PCA's axes are equations. t-SNE and UMAP's axes are decorative. The clusters they reveal are real; the geometry between clusters is not.

What These Methods Do

Both t-SNE (t-distributed stochastic neighbour embedding) and UMAP (uniform manifold approximation and projection) are nonlinear dimensionality reduction methods. The intuition behind both is the same in three steps:

In the original high-dimensional space, compute, for each point, which other points are its near neighbours.
In a low-dimensional space (usually 2D for visualization), place the points so that the same neighbours stay close together.
Solve for the placement that best preserves these local neighbourhoods.

The two methods differ in technical details — t-SNE optimizes a probability-based loss; UMAP uses a graph construction with a different mathematical foundation — but for managerial purposes they produce maps with the same character: dense little neighbourhoods of similar points, sometimes well-separated, with everything in between essentially up to the random seed.

Reading the Maps Honestly

PCA: axes you can read. t-SNE / UMAP: neighborhoods you can see.

Don’t read distances or angles in a t-SNE/UMAP map literally. They are good for spotting groups, poor for explaining them.

Figure 1. The same toy data viewed two ways. PCA (left) gives a layout whose axes carry meaning — direction along PC1 has a story. t-SNE / UMAP (right) gives tighter neighbourhoods that make groups visible, but the axes are decorative: the same point pattern could have been rotated, flipped, or rearranged with a different random seed.

A short, important checklist for reading a t-SNE or UMAP plot:

Do trust: which points end up close to which other points. The local neighbourhood structure is what the algorithm optimizes for.
Don't trust: the distance between two clusters. Two clusters that look close on the plot may be far in the original space; two that look far may be close.
Don't trust: the orientation of the axes. Up-down and left-right are meaningless; if you tilt your head, you have not changed the data.
Don't trust: cluster sizes on the plot. The algorithms tend to inflate small dense clusters and shrink large diffuse ones for visual balance.
Don't trust: the plot across runs. With a different random seed or different hyperparameters (perplexity in t-SNE, n_neighbours in UMAP), the cluster shapes can change while the cluster membership stays roughly the same.

The phrase to remember: neighbourhoods, not geometry.

When to Reach for These Tools

A short list of situations where t-SNE or UMAP earn their place:

Exploratory visualization of high-dimensional embeddings. When the input is itself a learned representation (text embeddings, image embeddings, product embeddings — all discussed in Part V), PCA usually flattens the structure into uninterpretable blobs. UMAP reveals the cluster topology that the embedding has learned.
Validating that segmentation is plausible. If a K-means clustering produces a story the team likes, a UMAP plot colored by segment is a quick way to see whether the segments correspond to visible regions of similarity in the data — and where the boundaries are fuzzy.
Diagnosing data quality issues. A surprisingly tight cluster on a UMAP that doesn't correspond to a known segment is often a data-engineering artefact (duplicate records, an unusual data source, a feature with a stuck value).

A short list of situations where they should not be used:

Communicating positioning to an executive audience. A perceptual map (PCA) is better here because the axes have meaning.
As input features to downstream models. UMAP coordinates can be unstable across runs, which makes them bad inputs to systems that need reproducibility.
For policy decisions. "Customers in this cluster look like those" is a fine hypothesis generated by a UMAP plot. Decisions need to fall back on stable, definable feature sets.

Concept check

Three questions spanning Chapter 11 — clustering, PCA, and nonlinear maps.

1.
A team runs K-means with k=4 and k=5 on the same standardized features. The five-cluster solution splits one of the four clusters in two; the other four are unchanged. The honest reading is:
2.
On a biplot, "affordability" loads at (-0.85, 0.05) and "premium" at (0.85, 0.45). The first component is reasonably interpreted as:
3.
A UMAP plot shows five tight clusters. Re-running with a different random seed produces five tight clusters with similar membership but visibly different positions and shapes. The right interpretation is: