§6.3

Panel Data and Fixed Effects

Across stores, customers, and markets, business units differ permanently in ways that are extremely hard to measure: neighborhood income, store layout, manager skill, local competitor density. If we pool data across units and run one regression, the differences across units pollute the relationship we wanted to estimate within units. The fix is structural: track each unit over time, and use that within-unit variation to identify the effect. The technique is called fixed effects, and it is the workhorse identification strategy of empirical business analysis.

This article builds the within-unit-vs-across-unit distinction visually first, then derives the demeaning transformation that makes fixed effects mechanically equivalent to a simple regression on within-unit deviations, and ends with a data case on store fixed effects in soup pricing.


The Executive Question: Are We Comparing Stores to Stores, or to Themselves?

A naive pooled regression of latte volume on price across 100 stores returns a positive slope: stores charging more sell more lattes. The recommendation looks like "raise prices."

The audit reveals the obvious story. Suburban stores charge $3.80 and sell 8,000 lattes a week; urban stores charge $3.00 and sell 5,000. The pooled comparison is between two different kinds of stores — and the across-store demographic gap dominates the within-store price response that any sane pricing decision would actually depend on.

The executive question is not "what is the correlation between price and volume in the dataset" but:

What happens to the same store's volume when that store changes its own price?

That is the question fixed effects answers, by construction.


Visualizing the Difference

Figure 1 shows the same conceptual picture three times. Three stores with different baseline levels (the intercepts) move modestly in response to a lever (the within-unit slopes). Looking across stores, the levels dominate the picture. Looking within each store, the slopes are similar and modest.

Fixed effects use the variation within each unit, not across

time →outcomeStore AStore BStore C

Cross-store comparison mixes level differences (intercepts) with the lever's effect. Store fixed effects subtract each store's own mean (dashed) and identify the slope from within-store wiggles only.

Figure 1. Cross-unit comparison mixes stable level differences (intercepts) with the lever's effect. Store fixed effects subtract each store's own mean (dashed) and identify the slope from within-store wiggles only.

A pooled regression on this data fits one line through all the points and is dragged by the level gaps. A store-fixed-effects regression fits one slope to all the points after subtracting each store's mean from each store's points — and that slope is the within-store response.

ComparisonWhat the variation comes fromWhat's absorbedWhat's not
Pooled OLSAcross all stores and weeksNothingStable store differences + common time shocks
Entity (store) fixed effectsWithin a store, across weeksAll stable store-level differencesCommon time shocks (e.g. holidays)
Two-way fixed effects (store + week)Within a store, against the common-week baselineStable store differences + week-level shocksTime-varying, store-specific shocks

The two-way fixed effects (TWFE) specification — store and week — is the standard starting point for any panel pricing analysis.


The Method: Demeaning

Let units be indexed by i{1,,N}i \in \{1, \dots, N\} and time periods by t{1,,T}t \in \{1, \dots, T\}. The TWFE model is

Two-way fixed effects (TWFE)

Yit=β1Xit+β2Wit+αi+γt+uitY_{it} = \beta_1 X_{it} + \beta_2 W_{it} + \alpha_i + \gamma_t + u_{it}

where αi\alpha_i is an intercept for each unit, γt\gamma_t is an intercept for each time period, XitX_{it} is the treatment, and WitW_{it} are time-varying controls.

For thousands of units, estimating thousands of dummy intercepts is infeasible by brute force. The within transformation sidesteps the problem. Define each unit's mean over time:

Zˉi  =  1Tt=1TZit\bar{Z}_i \;=\; \frac{1}{T}\sum_{t=1}^{T} Z_{it}

and subtract it from every observation of that variable:

Z~it  =  ZitZˉi\tilde{Z}_{it} \;=\; Z_{it} - \bar{Z}_i

Apply that transformation to both sides of the model. Because αi\alpha_i is constant over time within unit ii, its mean is itself, and demeaning makes it vanish:

Demeaned regression

Y~it  =  β1X~it+β2W~it+u~it\tilde{Y}_{it} \;=\; \beta_1 \tilde{X}_{it} + \beta_2 \tilde{W}_{it} + \tilde{u}_{it}

The coefficient β1\beta_1 in this demeaned regression is exactly the fixed-effects coefficient from the original specification. Two consequences are worth memorizing:

  1. Every time-invariant unit characteristic — observed or unobserved — has a demeaned value of exactly zero. Square footage, ZIP-code income, store age, manager identity: all of it is absorbed. You do not have to measure stable confounders. You only have to assume they are stable.
  2. All of the identifying variation comes from changes within a unit. If a store never changes its price across the panel, that store contributes nothing to the price coefficient — its within-unit price variation is zero.

Centering, Live

The pooled-vs-centered comparison in Figure 2 lets you switch between the two views. The pooled scatter shows the misleading positive slope. The centered scatter slides each store's points to the origin and reveals the true within-store relationship.

How Store Fixed Effects Isolate Within-Store Price Sensitivity

Pooled raw comparisons (Confounded by neighborhood demographics, creating a false positive slope)

Store A (Suburban) Store B (Urban)
$1.20$1.60$2.00$2.406k7k8k9kLatte Price ($)Latte Sales (Weekly Volume)Store A AvgStore B AvgPooled OLS Slope (+2.88)W1W2W3W4W5W1W2W3W4W5

What to notice: Suburban Store A operates in a high-income area with high baseline demand ($V_A = 8.16k$) and has higher average pricing ($P_A = $2.30$). Urban Store B operates in a lower-income area with lower demand ($V_B = 5.86k$) and lower average prices ($P_B = $1.50$). If we naively pool them, the regression compares across stores, producing a false positive slope (+2.88) which suggests that raising prices increases demand.

Figure 2. Pooled and centered views of the same panel data. In the pooled view the across-store demographic gap fits a positive slope; in the centered view, with each store's mean subtracted, the true within-store negative slope emerges.

The centered view is what the fixed-effects regression "sees." That visual lens — pull every unit's mean to the origin and look at the residual variation — is the most useful mental model for reading any fixed-effects coefficient.

Concept check

Three questions spanning what regression isolates, what identification adds, and what fixed effects buy.

  1. 1.
    The Frisch–Waugh–Lovell view of multiple regression says that the coefficient on X1X_1 is the slope of…
  2. 2.
    Which of the following best captures the difference between identification and estimation?
  3. 3.
    A pooled regression of volume on price returns a positive slope. The same regression with store fixed effects returns a negative slope. Which interpretation is most defensible?

Data Case: Store Fixed Effects in the Progresso Panel

We climb the same regression ladder as Chapter 6.1 — bivariate, then with seasonality, then with competitor price, then with regional dummies, then with store fixed effects. The final rung is the within-store estimate that the previous article promised.

The elasticity estimate changes as the comparison gets cleaner

88,409 store-months across 2,042 stores. Coefficient is on log(Progresso price).

-3.4-3.0-2.6-2.2Raw log-log-3.21R2 0.28+ month seasonality-2.46R2 0.35+ competitor prices-3.12R2 0.43+ region controls-2.66R2 0.57+ store fixed effects-2.23R2 0.90Elasticity-style coefficient
Figure 3. The Progresso regression ladder, ending in store fixed effects. The store-FE specification explains a much larger share of variance because it absorbs the stable across-store differences. The coefficient on price settles near −2.23.

The step from Model 4 (regional dummies) to Model 5 (store fixed effects) is the smallest movement on the ladder but the most meaningful from an identification standpoint. Regional dummies absorb broad geographic differences. Store fixed effects absorb everything stable at the store level — neighborhood income, local competition, shelf placement, manager habits, parking. The coefficient moves modestly because regional dummies already captured a fair amount of the variation, but the reason we trust the estimate now is structural: there is no remaining stable store-level confounder it could absorb.

Reading the headline: a 1% increase in Progresso price within a given store, in a given month, is associated with roughly a 2.23% decrease in unit volume, after controlling for that store's stable characteristics, competitor pricing, and seasonal demand.

A pricing recommendation built on the naive −3.21 would have overstated price sensitivity by nearly a factor of one and a half, with predictable consequences — over-discounting to chase imagined volume, margin erosion, no actual lift. The fixed-effects estimate is the safe basis for pricing not because the number is smaller but because the comparison is fair.