Part IV · Chapter 11
Segmentation and Latent Structure
Unsupervised methods don't hand you answers — they hand you a lens, and a manager decides whether the structure is worth acting on.
This chapter covers what to do when a business question arrives without a target variable — which customers behave alike, which brands compete in the same mental space — where the algorithm's job shifts from confirming a pattern to proposing a lens. It pairs the two strands of unsupervised learning: clustering (K-means, hierarchical, DBSCAN, with elbow and silhouette diagnostics for choosing k) and dimensionality reduction (PCA, Factor Analysis, and perceptual maps), then pushes into the nonlinear maps t-SNE and UMAP that reveal neighborhoods at the cost of meaningless axes. A running discipline ties it together: a cluster becomes a segment only when a manager attaches a name, a different action, and a definition stable across reasonable choices. The capstone is a ZIP-level study of New York Lottery data, with demographics held out of the fit so they profile the segments rather than define them.
Topics covered
In this chapter
- 11.1Clustering for SegmentationIntroduces K-means, hierarchical, and DBSCAN clustering, standardization, and choosing k — and why a cluster only becomes a segment when a manager names and acts on it.
- 11.2PCA, Factor Analysis, and Perceptual MapsExplains how PCA and Factor Analysis compress dozens of correlated survey attributes into readable axes, and how to read a biplot and perceptual map without overstating it.
- 11.3Nonlinear Maps: t-SNE and UMAPShows when nonlinear maps t-SNE and UMAP reveal cluster structure PCA misses, with a strict checklist for trusting neighborhoods but never distances, sizes, or axes.
- 11.4Case Study: Lottery ZIP PsychographicsA non-causal NY Lottery ZIP study where PCA and k-means recover four neighborhood lottery routines, profiled by demographics held out of the model fit.