Part III · Chapter 6

Regression and Identification

A regression number is only as trustworthy as the comparison it secretly makes.

This chapter focuses on what "holding something constant" actually means and when a regression earns the word causal. It begins with multiple regression as effect isolation, using the Frisch–Waugh–Lovell theorem to show that controlling for a variable is really a two-stage residualization, then climbs a regression ladder on roughly 88,000 store-months of Progresso scanner data as the price elasticity settles from a naive −3.21 to a defensible −2.23. From there it separates identification from estimation, introduces DAGs and the fork–chain–collider patterns, and closes on panel fixed effects, where demeaning absorbs every stable store difference you could never measure. The discipline it leaves behind: insist on the identification memo and the diagnostics before reading the number, because a precise estimate of an unidentified quantity is precisely wrong.

Topics covered

Frisch–Waugh–Lovell residualizationomitted-variable biasbad controls and collider biasthe regression ladder on scanner dataidentification vs. estimationdirected acyclic graphs (fork, chain, collider)the identification memopanel fixed effects and the within transformationtwo-way fixed effects (TWFE)

In this chapter

  1. 6.1Regression as Effect IsolationShows that multiple regression's "holding constant" is Frisch–Waugh–Lovell residualization, then climbs a Progresso price-elasticity ladder from −3.21 to −2.23.
  2. 6.2IdentificationSeparates identification from estimation, teaches DAGs and the fork–chain–collider patterns, and audits a milk-pricing quasi-experiment with balance and placebo checks.
  3. 6.3Panel Data and Fixed EffectsDerives the demeaning transformation behind panel fixed effects, showing how within-store variation absorbs unmeasured stable confounders to flip a misleading price slope.

Interactive studios