Part I · Chapter 1

Reading Data as Business Evidence

Before you ask what model to run, ask what one row means, what shape the table is in, and what kind of column you are looking at.

This chapter focuses on the three reading errors that sit behind almost every analytical mistake a manager hears about — double-counted revenue, joins that explode, a 4.2-star average that misleads, a quarterly ranking that rewards a fading store. Working through a single week, then eight weeks, at Bean & Basket Coffee, it shows how the grain of a row, the shape of a table, and the measurement type of a column quietly set the ceiling on what any analysis can honestly claim. The takeaway is a posture rather than a formula: read the panel before you rank, keep the finest grain you can sustain, and write the one-page variable dictionary that catches type confusion at design time instead of slide-review time.

Topics covered

dataset grain and unit of observationmean-of-a-mean weighting errorduplicate explosion across mismatched join grainscross-section vs. time-series vs. panelgeo-spatial and network data shapessnapshot mistaken for trendstorage type vs. measurement typeordered categoricals and the top-2-box sharethe variable dictionary as cheap insurance

In this chapter

  1. 1.1What Is a Dataset?Defines a dataset's grain — what one row means — and shows how transaction- vs. store-week views answer different questions and break joins.
  2. 1.2Data StructuresDistinguishes cross-section, time-series, panel, geo-spatial, and network shapes, proving the panel is the only view that reveals where growth comes from.
  3. 1.3Variable Types and MeasurementSeparates storage type from measurement type across seven variable kinds, showing why averaging IDs, ZIPs, or 1-to-5 ratings misleads.