Part I · Chapter 2

Working With Business Tables

Spreadsheet operations, written down as reproducible code — and the conceptual errors that hide inside correct-looking queries.

This chapter translates the spreadsheet moves a manager already knows into reproducible code: a pivot table becomes GROUP BY, a VLOOKUP becomes a JOIN, and the same Bean & Basket question runs identically across SQL, dplyr, and pandas. From there it works through the operations where business reality leaks in through table arithmetic — joins that silently double revenue, wide-versus-long reshaping, and transformations like logs and rolling averages that turn raw columns into governed metrics. The recurring lesson is that the dangerous errors are conceptual rather than syntactic: an inner join dropping 5–15% of transactions, an average-of-ratios that weights every store equally, hard-deletes that manufacture a fake retention trend. It ends with the Data Language Studio, where students assemble a reusable data-and-metric brief to carry forward.

Start reading

Topics covered

Excel → SQL → dplyr → pandas translationGROUP BY vs. WHERE vs. HAVINGleft, inner, anti, and full outer joinsduplicate-explosion from one-to-many joinsgrain matching before aggregationwide vs. long reshaping (pivot/melt)log transforms and rolling averagesmetric definitions as governance contractsaverage-of-ratios vs. ratio-of-totalssurvivorship bias from hard deletes

Topics covered

In this chapter