Part IV · Chapter 11

Segmentation and Latent Structure

Unsupervised methods don't hand you answers — they hand you a lens, and a manager decides whether the structure is worth acting on.

This chapter covers what to do when a business question arrives without a target variable — which customers behave alike, which brands compete in the same mental space — where the algorithm's job shifts from confirming a pattern to proposing a lens. It pairs the two strands of unsupervised learning: clustering (K-means, hierarchical, DBSCAN, with elbow and silhouette diagnostics for choosing k) and dimensionality reduction (PCA, Factor Analysis, and perceptual maps), then pushes into the nonlinear maps t-SNE and UMAP that reveal neighborhoods at the cost of meaningless axes. A running discipline ties it together: a cluster becomes a segment only when a manager attaches a name, a different action, and a definition stable across reasonable choices. The capstone is a ZIP-level study of New York Lottery data, with demographics held out of the fit so they profile the segments rather than define them.

Topics covered

K-means, hierarchical, and DBSCAN clusteringelbow plots and silhouette scoresfeature standardization and distance metricsPCA scores, loadings, and the biplotFactor Analysis vs. PCAperceptual maps and white-space positioningt-SNE and UMAP nonlinear embeddingsneighborhoods-not-geometry interpretationecological inference and ZIP-level segmentation

In this chapter

  1. 11.1Clustering for SegmentationIntroduces K-means, hierarchical, and DBSCAN clustering, standardization, and choosing k — and why a cluster only becomes a segment when a manager names and acts on it.
  2. 11.2PCA, Factor Analysis, and Perceptual MapsExplains how PCA and Factor Analysis compress dozens of correlated survey attributes into readable axes, and how to read a biplot and perceptual map without overstating it.
  3. 11.3Nonlinear Maps: t-SNE and UMAPShows when nonlinear maps t-SNE and UMAP reveal cluster structure PCA misses, with a strict checklist for trusting neighborhoods but never distances, sizes, or axes.
  4. 11.4Case Study: Lottery ZIP PsychographicsA non-causal NY Lottery ZIP study where PCA and k-means recover four neighborhood lottery routines, profiled by demographics held out of the model fit.

Interactive studios