§6.2

Regression as Effect Isolation

The Southwest regression review showed you how to add a control and watch a coefficient move — that is standard practice. What it did not show you is what "holding other things constant" actually computes. It is not a real-world intervention; it is a specific mathematical operation called residualization, and understanding that operation changes how you read every regression coefficient you will ever see.

We will make that operation precise with the Frisch–Waugh–Lovell view of what multiple regression computes, then end with a data case that climbs a "regression ladder" on real scanner data — adding controls one at a time and watching the price coefficient settle as confounders are stripped out.

The Executive Question: What Else Was Changing?

A naive correlation says weeks with heavier email-coupon volume see higher revenue. Three things were probably also true of those weeks: they were holiday weeks, a competitor was on the air, and the recipients were the loyal customers most likely to buy anyway. Each one contaminates the coupon-revenue correlation in a predictable direction.

Confounder	Correlation with coupon volume	Effect on revenue	Direction of naive bias
Holiday season	Positive (more sends in peak weeks)	Positive (holiday demand)	Inflates apparent coupon lift
Competitor ad blitz	Positive (defensive couponing)	Negative (share losses)	Suppresses apparent lift
Loyal customer targeting	Positive (loyalists on the list)	Positive (they buy anyway)	Inflates apparent lift

The decision-relevant question is not "are coupons correlated with revenue?" but "what does the data look like when we hold those three factors constant?" Multiple regression is the tool that gives a precise answer.

Multiple Regression: Effect Isolation by Math

The standard linear model is

Multiple linear regression

Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + \varepsilon_i

The coefficient $\beta_1$ has a very specific managerial interpretation:

$\beta_1$ is the expected change in the outcome $Y$ for a one-unit increase in $X_1$ , holding the other included variables fixed.

That last clause does not mean we held anything constant in the real world. It means the regression mathematically removed the part of $X_1$ 's variation that the controls could explain, removed the part of $Y$ 's variation that the controls could explain, and looked at what was left. That two-step "what was left" view is the Frisch–Waugh–Lovell theorem.

The Frisch–Waugh–Lovell View

FWL says a multiple-regression coefficient is identical to a much simpler two-stage procedure:

Residualize the treatment. Regress $X_1$ on the controls. Keep the residual $\tilde{X}_1$ — the variation in the treatment that the controls cannot explain.
Residualize the outcome. Regress $Y$ on the controls. Keep the residual $\tilde{Y}$ — the variation in the outcome the controls cannot explain.
Slope of residuals on residuals. The simple slope of $\tilde{Y}$ on $\tilde{X}_1$ is exactly the multiple-regression coefficient $\beta_1$ .

Regression as residualization (Frisch–Waugh)

The regression coefficient on D after adjusting for controls X equals the simple regression of Y's residuals on D's residuals.

Figure 1. Frisch–Waugh–Lovell as a two-stage residualization. Regression isolates the unique covariance between the parts of Y and D that the controls cannot explain.

This is enormously clarifying. A regression coefficient is never about all of the variation in $X_1$ . It is about the slice of $X_1$ that moves independently of whatever controls you included. The implications follow directly:

Controls steal variation, by design. Adding a control absorbs the part of $X_1$ that moves with that control. If your treatment varies mostly with one of your controls, very little independent variation is left, and the coefficient becomes noisy.
A "bad control" (a post-treatment variable) is a thief, not a friend. If $X_2$ is itself a consequence of the treatment, residualizing the treatment on $X_2$ removes part of the very causal pathway you are trying to measure.
The model has nothing to say about regions of $X_1$ with no data. The residualized scatter only covers the support of $\tilde{X}_1$ you actually observed. Predictions outside it are extrapolation, not estimation.

Reading a Coefficient Like a Manager

A defensible report of a regression coefficient always names four things — the outcome, the lever, the scale, and the controls:

"Holding store and week fixed effects, competitor pricing, and seasonal dummies constant, a 1% increase in price is associated with a 2.23% decrease in unit volume."

Notice how much that sentence concedes. It does not claim causation in general — it claims a conditional comparison. The clause "holding X constant" is doing real work, and naming exactly which X's are held constant is what separates a serious estimate from a marketing slide.

Data Case: The Progresso Soup Regression Ladder

A "regression ladder" runs the same outcome–treatment regression with progressively more controls, and watches the coefficient evolve. It is the most useful single artifact in an observational pricing study, because the shifts are what tell you which confounders mattered.

For Progresso soup, we regress $\log(\text{volume})$ on $\log(\text{price})$ across roughly 88,000 store-month observations, building up the control set in five steps:

Raw correlation — bivariate log-log regression, no controls.
+ Seasonality — month dummies absorbing winter demand shocks.
+ Competitor price — log price of Campbell's, the nearest substitute.
+ Region — census-region dummies absorbing stable regional preferences.
+ Store fixed effects — one intercept per store, absorbing every stable store-level difference. (We will cover this design formally in the next article.)

The elasticity estimate changes as the comparison gets cleaner

88,409 store-months across 2,042 stores. Coefficient is on log(Progresso price).

Figure 2. The Progresso price-elasticity ladder. The coefficient moves from a naive −3.21 to −2.23 as confounders are added. The shifts (not the final number alone) are the teaching content.

Two patterns are worth naming:

The biggest single move comes from adding store fixed effects. Whatever stable differences across stores were driving the naive bias — neighborhood income, local competition density, store size — they were the largest source of confounding. Census-region dummies absorbed some, but not most, of that variation.
The estimate stabilizes. Between the demographics step and the store fixed-effects step the coefficient shifts only modestly, and within the noise of the last two specifications it is hard to argue for further controls. Stability across the last rungs is the visual signature of a credible identification, given the controls in hand.

The preferred estimate, $\widehat{\beta} \approx -2.23$ , says that within a given store, a 10% price increase is associated with roughly a 22% volume decrease, after stripping out seasonal demand, competitor pricing, and stable store-level differences. That is the number you would use for pricing — and the regression ladder is what earns it the right to be used.