Part I · Chapter 2
Working With Business Tables
Spreadsheet operations, written down as reproducible code — and the conceptual errors that hide inside correct-looking queries.
This chapter translates the spreadsheet moves a manager already knows into reproducible code: a pivot table becomes GROUP BY, a VLOOKUP becomes a JOIN, and the same Bean & Basket question runs identically across SQL, dplyr, and pandas. From there it works through the operations where business reality leaks in through table arithmetic — joins that silently double revenue, wide-versus-long reshaping, and transformations like logs and rolling averages that turn raw columns into governed metrics. The recurring lesson is that the dangerous errors are conceptual rather than syntactic: an inner join dropping 5–15% of transactions, an average-of-ratios that weights every store equally, hard-deletes that manufacture a fake retention trend. It ends with the Data Language Studio, where students assemble a reusable data-and-metric brief to carry forward.
Topics covered
In this chapter
- 2.1SQL Like ExcelMaps nine everyday Excel actions to identical SQL, dplyr, and pandas code, then runs one revenue-by-city question four ways.
- 2.2Joining DataShows how left, inner, and anti-joins attach business context — and how a naive campaign join inflates revenue by nearly double.
- 2.3Reshaping DataContrasts wide and long shapes of the same nine numbers, arguing to store data long and pivot wide only for display.
- 2.4Transformations and Business MetricsCatalogs eight transformations and nine dashboard metrics, showing why every metric is a definition with hidden choices, not a raw column.
- 2.5Data Quality and Measurement BiasTriages dirty transactions — returns, typos, sync gaps — into fix, investigate, or keep, warning against silent imputation and outlier deletion.
- 2.6Data Language StudioA capstone studio building a reusable six-part data-and-metric brief that hands a trustworthy evidence layer forward to Part II.