Part IV · Chapter 11

Segmentation and Latent Structure

Unsupervised methods don't hand you answers — they hand you a lens, and a manager decides whether the structure is worth acting on.

This chapter covers what to do when a business question arrives without a target variable — which customers behave alike, which brands compete in the same mental space — where the algorithm's job shifts from confirming a pattern to proposing a lens. It pairs the two strands of unsupervised learning: clustering (K-means, hierarchical, DBSCAN, with elbow and silhouette diagnostics for choosing k) and dimensionality reduction (PCA, Factor Analysis, and perceptual maps), then pushes into the nonlinear maps t-SNE and UMAP that reveal neighborhoods at the cost of meaningless axes. A running discipline ties it together: a cluster becomes a segment only when a manager attaches a name, a different action, and a definition stable across reasonable choices. The capstone is a ZIP-level study of New York Lottery data, with demographics held out of the fit so they profile the segments rather than define them.

Start reading

Topics covered

K-means, hierarchical, and DBSCAN clusteringelbow plots and silhouette scoresfeature standardization and distance metricsPCA scores, loadings, and the biplotFactor Analysis vs. PCAperceptual maps and white-space positioningt-SNE and UMAP nonlinear embeddingsneighborhoods-not-geometry interpretationecological inference and ZIP-level segmentation

In this chapter

Interactive studios

RestaurantsFast-Food Brand Perceptual MapReduce 48 BAV brand attributes into factor-map axes, inspect loadings, cluster fast-food brands, and test how much Brand Asset follows from the latent perception scores.Public FinanceLottery ZIP Psychographics: How Neighborhoods PlaySegment active NYC ZIP codes from NY Lottery behavior signals, then interpret the PCA/factor score space with borough, income, retailer availability, and product-mix profiles.Public HealthNYC Metro ZIP Health SegmentsUse health prevalence measures to build factor scores, cluster ZIP codes, and interpret the segments by correlating scores with income, age, college share, and deprivation.