Part I · Chapter 2

Working With Business Tables

Spreadsheet operations, written down as reproducible code — and the conceptual errors that hide inside correct-looking queries.

This chapter translates the spreadsheet moves a manager already knows into reproducible code: a pivot table becomes GROUP BY, a VLOOKUP becomes a JOIN, and the same Bean & Basket question runs identically across SQL, dplyr, and pandas. From there it works through the operations where business reality leaks in through table arithmetic — joins that silently double revenue, wide-versus-long reshaping, and transformations like logs and rolling averages that turn raw columns into governed metrics. The recurring lesson is that the dangerous errors are conceptual rather than syntactic: an inner join dropping 5–15% of transactions, an average-of-ratios that weights every store equally, hard-deletes that manufacture a fake retention trend. It ends with the Data Language Studio, where students assemble a reusable data-and-metric brief to carry forward.

Topics covered

Excel → SQL → dplyr → pandas translationGROUP BY vs. WHERE vs. HAVINGleft, inner, anti, and full outer joinsduplicate-explosion from one-to-many joinsgrain matching before aggregationwide vs. long reshaping (pivot/melt)log transforms and rolling averagesmetric definitions as governance contractsaverage-of-ratios vs. ratio-of-totalssurvivorship bias from hard deletes

In this chapter

  1. 2.1SQL Like ExcelMaps nine everyday Excel actions to identical SQL, dplyr, and pandas code, then runs one revenue-by-city question four ways.
  2. 2.2Joining DataShows how left, inner, and anti-joins attach business context — and how a naive campaign join inflates revenue by nearly double.
  3. 2.3Reshaping DataContrasts wide and long shapes of the same nine numbers, arguing to store data long and pivot wide only for display.
  4. 2.4Transformations and Business MetricsCatalogs eight transformations and nine dashboard metrics, showing why every metric is a definition with hidden choices, not a raw column.
  5. 2.5Data Quality and Measurement BiasTriages dirty transactions — returns, typos, sync gaps — into fix, investigate, or keep, warning against silent imputation and outlier deletion.
  6. 2.6Data Language StudioA capstone studio building a reusable six-part data-and-metric brief that hands a trustworthy evidence layer forward to Part II.