§6.3
Panel Data and Fixed Effects
Across stores, customers, and markets, business units differ permanently in ways that are extremely hard to measure: neighborhood income, store layout, manager skill, local competitor density. If we pool data across units and run one regression, the differences across units pollute the relationship we wanted to estimate within units. The fix is structural: track each unit over time, and use that within-unit variation to identify the effect. The technique is called fixed effects, and it is the workhorse identification strategy of empirical business analysis.
This article builds the within-unit-vs-across-unit distinction visually first, then derives the demeaning transformation that makes fixed effects mechanically equivalent to a simple regression on within-unit deviations, and ends with a data case on store fixed effects in soup pricing.
The Executive Question: Are We Comparing Stores to Stores, or to Themselves?
A naive pooled regression of latte volume on price across 100 stores returns a positive slope: stores charging more sell more lattes. The recommendation looks like "raise prices."
The audit reveals the obvious story. Suburban stores charge $3.80 and sell 8,000 lattes a week; urban stores charge $3.00 and sell 5,000. The pooled comparison is between two different kinds of stores — and the across-store demographic gap dominates the within-store price response that any sane pricing decision would actually depend on.
The executive question is not "what is the correlation between price and volume in the dataset" but:
What happens to the same store's volume when that store changes its own price?
That is the question fixed effects answers, by construction.
Visualizing the Difference
Figure 1 shows the same conceptual picture three times. Three stores with different baseline levels (the intercepts) move modestly in response to a lever (the within-unit slopes). Looking across stores, the levels dominate the picture. Looking within each store, the slopes are similar and modest.
Fixed effects use the variation within each unit, not across
Cross-store comparison mixes level differences (intercepts) with the lever's effect. Store fixed effects subtract each store's own mean (dashed) and identify the slope from within-store wiggles only.
A pooled regression on this data fits one line through all the points and is dragged by the level gaps. A store-fixed-effects regression fits one slope to all the points after subtracting each store's mean from each store's points — and that slope is the within-store response.
| Comparison | What the variation comes from | What's absorbed | What's not |
|---|---|---|---|
| Pooled OLS | Across all stores and weeks | Nothing | Stable store differences + common time shocks |
| Entity (store) fixed effects | Within a store, across weeks | All stable store-level differences | Common time shocks (e.g. holidays) |
| Two-way fixed effects (store + week) | Within a store, against the common-week baseline | Stable store differences + week-level shocks | Time-varying, store-specific shocks |
The two-way fixed effects (TWFE) specification — store and week — is the standard starting point for any panel pricing analysis.
The Method: Demeaning
Let units be indexed by and time periods by . The TWFE model is
Two-way fixed effects (TWFE)
where is an intercept for each unit, is an intercept for each time period, is the treatment, and are time-varying controls.
For thousands of units, estimating thousands of dummy intercepts is infeasible by brute force. The within transformation sidesteps the problem. Define each unit's mean over time:
and subtract it from every observation of that variable:
Apply that transformation to both sides of the model. Because is constant over time within unit , its mean is itself, and demeaning makes it vanish:
Demeaned regression
The coefficient in this demeaned regression is exactly the fixed-effects coefficient from the original specification. Two consequences are worth memorizing:
- Every time-invariant unit characteristic — observed or unobserved — has a demeaned value of exactly zero. Square footage, ZIP-code income, store age, manager identity: all of it is absorbed. You do not have to measure stable confounders. You only have to assume they are stable.
- All of the identifying variation comes from changes within a unit. If a store never changes its price across the panel, that store contributes nothing to the price coefficient — its within-unit price variation is zero.
Centering, Live
The pooled-vs-centered comparison in Figure 2 lets you switch between the two views. The pooled scatter shows the misleading positive slope. The centered scatter slides each store's points to the origin and reveals the true within-store relationship.
How Store Fixed Effects Isolate Within-Store Price Sensitivity
Pooled raw comparisons (Confounded by neighborhood demographics, creating a false positive slope)
What to notice: Suburban Store A operates in a high-income area with high baseline demand ($V_A = 8.16k$) and has higher average pricing ($P_A = $2.30$). Urban Store B operates in a lower-income area with lower demand ($V_B = 5.86k$) and lower average prices ($P_B = $1.50$). If we naively pool them, the regression compares across stores, producing a false positive slope (+2.88) which suggests that raising prices increases demand.
The centered view is what the fixed-effects regression "sees." That visual lens — pull every unit's mean to the origin and look at the residual variation — is the most useful mental model for reading any fixed-effects coefficient.
Concept check
Three questions spanning what regression isolates, what identification adds, and what fixed effects buy.
- 1.The Frisch–Waugh–Lovell view of multiple regression says that the coefficient on is the slope of…
- 2.Which of the following best captures the difference between identification and estimation?
- 3.A pooled regression of volume on price returns a positive slope. The same regression with store fixed effects returns a negative slope. Which interpretation is most defensible?
Data Case: Store Fixed Effects in the Progresso Panel
We climb the same regression ladder as Chapter 6.1 — bivariate, then with seasonality, then with competitor price, then with regional dummies, then with store fixed effects. The final rung is the within-store estimate that the previous article promised.
The elasticity estimate changes as the comparison gets cleaner
88,409 store-months across 2,042 stores. Coefficient is on log(Progresso price).
The step from Model 4 (regional dummies) to Model 5 (store fixed effects) is the smallest movement on the ladder but the most meaningful from an identification standpoint. Regional dummies absorb broad geographic differences. Store fixed effects absorb everything stable at the store level — neighborhood income, local competition, shelf placement, manager habits, parking. The coefficient moves modestly because regional dummies already captured a fair amount of the variation, but the reason we trust the estimate now is structural: there is no remaining stable store-level confounder it could absorb.
Reading the headline: a 1% increase in Progresso price within a given store, in a given month, is associated with roughly a 2.23% decrease in unit volume, after controlling for that store's stable characteristics, competitor pricing, and seasonal demand.
A pricing recommendation built on the naive −3.21 would have overstated price sensitivity by nearly a factor of one and a half, with predictable consequences — over-discounting to chase imagined volume, margin erosion, no actual lift. The fixed-effects estimate is the safe basis for pricing not because the number is smaller but because the comparison is fair.