§6.1

Regression as Effect Isolation

When several business variables move at once, a simple two-variable correlation is rarely the answer to a strategic question. Multiple regression is the workhorse that lets managers ask a sharper question: how does the outcome respond to a single lever, holding other observable factors constant? This article explains what that "holding constant" actually does — it is residualization, not real-world control — and why understanding the mechanics changes how you read every regression coefficient you will ever see.

We will work through the regression model, walk through the Frisch–Waugh–Lovell view of what it computes, and end with a data case that climbs a "regression ladder" on real scanner data — adding controls one at a time and watching the price coefficient settle as confounders are stripped out.


The Executive Question: What Else Was Changing?

A naive correlation says weeks with heavier email-coupon volume see higher revenue. Three things were probably also true of those weeks: they were holiday weeks, a competitor was on the air, and the recipients were the loyal customers most likely to buy anyway. Each one contaminates the coupon-revenue correlation in a predictable direction.

ConfounderCorrelation with coupon volumeEffect on revenueDirection of naive bias
Holiday seasonPositive (more sends in peak weeks)Positive (holiday demand)Inflates apparent coupon lift
Competitor ad blitzPositive (defensive couponing)Negative (share losses)Suppresses apparent lift
Loyal customer targetingPositive (loyalists on the list)Positive (they buy anyway)Inflates apparent lift

The decision-relevant question is not "are coupons correlated with revenue?" but "what does the data look like when we hold those three factors constant?" Multiple regression is the tool that gives a precise answer.


Multiple Regression: Effect Isolation by Math

The standard linear model is

Multiple linear regression

Yi=β0+β1X1i+β2X2i++βkXki+εiY_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + \varepsilon_i

The coefficient β1\beta_1 has a very specific managerial interpretation:

β1\beta_1 is the expected change in the outcome YY for a one-unit increase in X1X_1, holding the other included variables fixed.

That last clause does not mean we held anything constant in the real world. It means the regression mathematically removed the part of X1X_1's variation that the controls could explain, removed the part of YY's variation that the controls could explain, and looked at what was left. That two-step "what was left" view is the Frisch–Waugh–Lovell theorem.


The Frisch–Waugh–Lovell View

FWL says a multiple-regression coefficient is identical to a much simpler two-stage procedure:

  1. Residualize the treatment. Regress X1X_1 on the controls. Keep the residual X~1\tilde{X}_1 — the variation in the treatment that the controls cannot explain.
  2. Residualize the outcome. Regress YY on the controls. Keep the residual Y~\tilde{Y} — the variation in the outcome the controls cannot explain.
  3. Slope of residuals on residuals. The simple slope of Y~\tilde{Y} on X~1\tilde{X}_1 is exactly the multiple-regression coefficient β1\beta_1.

Regression as residualization (Frisch–Waugh)

Outcome Ywith all variationTreatment Dwith all variationY ⟂ X (residual)variation in Y not explained by controlsD ⟂ X (residual)variation in D not explained by controlsCoefficient on Dslope of Y-residual on D-residualpartial out Xpartial out X

The regression coefficient on D after adjusting for controls X equals the simple regression of Y's residuals on D's residuals.

Figure 1. Frisch–Waugh–Lovell as a two-stage residualization. Regression isolates the unique covariance between the parts of Y and D that the controls cannot explain.

This is enormously clarifying. A regression coefficient is never about all of the variation in X1X_1. It is about the slice of X1X_1 that moves independently of whatever controls you included. The implications follow directly:

  • Controls steal variation, by design. Adding a control absorbs the part of X1X_1 that moves with that control. If your treatment varies mostly with one of your controls, very little independent variation is left, and the coefficient becomes noisy.
  • A "bad control" (a post-treatment variable) is a thief, not a friend. If X2X_2 is itself a consequence of the treatment, residualizing the treatment on X2X_2 removes part of the very causal pathway you are trying to measure.
  • The model has nothing to say about regions of X1X_1 with no data. The residualized scatter only covers the support of X~1\tilde{X}_1 you actually observed. Predictions outside it are extrapolation, not estimation.

Reading a Coefficient Like a Manager

A defensible report of a regression coefficient always names four things — the outcome, the lever, the scale, and the controls:

"Holding store and week fixed effects, competitor pricing, and seasonal dummies constant, a 1% increase in price is associated with a 2.23% decrease in unit volume."

Notice how much that sentence concedes. It does not claim causation in general — it claims a conditional comparison. The clause "holding X constant" is doing real work, and naming exactly which X's are held constant is what separates a serious estimate from a marketing slide.


Data Case: The Progresso Soup Regression Ladder

A "regression ladder" runs the same outcome–treatment regression with progressively more controls, and watches the coefficient evolve. It is the most useful single artifact in an observational pricing study, because the shifts are what tell you which confounders mattered.

For Progresso soup, we regress log(volume)\log(\text{volume}) on log(price)\log(\text{price}) across roughly 88,000 store-month observations, building up the control set in five steps:

  1. Raw correlation — bivariate log-log regression, no controls.
  2. + Seasonality — month dummies absorbing winter demand shocks.
  3. + Competitor price — log price of Campbell's, the nearest substitute.
  4. + Region — census-region dummies absorbing stable regional preferences.
  5. + Store fixed effects — one intercept per store, absorbing every stable store-level difference. (We will cover this design formally in the next article.)

The elasticity estimate changes as the comparison gets cleaner

88,409 store-months across 2,042 stores. Coefficient is on log(Progresso price).

-3.4-3.0-2.6-2.2Raw log-log-3.21R2 0.28+ month seasonality-2.46R2 0.35+ competitor prices-3.12R2 0.43+ region controls-2.66R2 0.57+ store fixed effects-2.23R2 0.90Elasticity-style coefficient
Figure 2. The Progresso price-elasticity ladder. The coefficient moves from a naive −3.21 to −2.23 as confounders are added. The shifts (not the final number alone) are the teaching content.

Two patterns are worth naming:

  • The biggest single move comes from adding store fixed effects. Whatever stable differences across stores were driving the naive bias — neighborhood income, local competition density, store size — they were the largest source of confounding. Census-region dummies absorbed some, but not most, of that variation.
  • The estimate stabilizes. Between the demographics step and the store fixed-effects step the coefficient shifts only modestly, and within the noise of the last two specifications it is hard to argue for further controls. Stability across the last rungs is the visual signature of a credible identification, given the controls in hand.

The preferred estimate, β^2.23\widehat{\beta} \approx -2.23, says that within a given store, a 10% price increase is associated with roughly a 22% volume decrease, after stripping out seasonal demand, competitor pricing, and stable store-level differences. That is the number you would use for pricing — and the regression ladder is what earns it the right to be used.