Part III · Chapter 6

Regression and Identification

A regression number is only as trustworthy as the comparison it secretly makes.

This chapter focuses on what "holding something constant" actually means and when a regression earns the word causal. It begins with multiple regression as effect isolation, using the Frisch–Waugh–Lovell theorem to show that controlling for a variable is really a two-stage residualization, then climbs a regression ladder on roughly 88,000 store-months of Progresso scanner data as the price elasticity settles from a naive −3.21 to a defensible −2.23. From there it separates identification from estimation, introduces DAGs and the fork–chain–collider patterns, and closes on panel fixed effects, where demeaning absorbs every stable store difference you could never measure. The discipline it leaves behind: insist on the identification memo and the diagnostics before reading the number, because a precise estimate of an unidentified quantity is precisely wrong.

Start reading

Topics covered

Frisch–Waugh–Lovell residualizationomitted-variable biasbad controls and collider biasthe regression ladder on scanner dataidentification vs. estimationdirected acyclic graphs (fork, chain, collider)the identification memopanel fixed effects and the within transformationtwo-way fixed effects (TWFE)

In this chapter

Interactive studios

AirlinesThe Southwest EffectDoes a low-cost carrier entering a route really pull fares down — and by how much, once you hold distance and demand fixed? A visual walk through the classic regression.AirlinesRegression Exercise: Did Southwest Lower Airfares?The hands-on companion: download the route data, run the regression yourself, and read the coefficients the way a manager would. Built for a live class session.