§7.1

Difference-in-Differences

When a firm rolls out a new feature, a price change, or a regional policy, it rarely has the luxury of randomly assigning treatment. Rollouts are regional, gradual, and tangled up with macro trends. Difference-in-differences (DiD) is the workhorse design for these settings. It bypasses the two naive comparisons every executive deck reaches for first — before vs. after in the treated group, and treated vs. untreated after the rollout — and replaces them with a comparison that nets out both confounders at once.

This article works through the DiD logic visually and algebraically, derives the regression specification that recovers the DiD estimate as an interaction coefficient, and ends with the identifying assumption — parallel trends — that everything else hinges on.


The Executive Question: Did We Grow Faster Than the Tide?

A regional team launches a new mobile checkout feature in the West Coast stores. Weekly transactions in West Coast stores rise from 100,000 to 130,000 — a thirty-thousand-transaction gain. The team prepares to recommend a national rollout.

Before approving, the executive question:

How much of the 30,000-transaction increase was actually caused by the feature, and how much would have happened anyway?

Over the same period, untreated East Coast stores rose from 90,000 to 100,000. Ten thousand of the West Coast's gain was the tide rising for everyone. The remaining twenty thousand is the part the feature can plausibly claim.

That subtraction-of-subtractions is the DiD estimator. The 2×2 below shows where it sits relative to the two naive comparisons.

Table 1. Why naive comparisons overstate the effect of a regional rollout. The DiD estimate (third row) nets out both the common tide and the stable baseline gap between regions.
ComparisonCalculationImplied effectWhat it confounds with the effect
Naive before/after (West only)West Post − West Pre = 130k − 100k+30kCommon seasonal/macro tide affecting both regions
Naive cross-section (Post only)West Post − East Post = 130k − 100k+30kStable baseline difference between the two regions
Difference-in-differences(130k − 100k) − (100k − 90k)+20kNeither (under parallel trends)

The 2×2 Picture

The cleanest way to understand DiD is to memorize the 2×2.

Difference-in-differences as a 2×2 comparison

PrePostControlTreatedY₀₀Control · PreY₀₁Control · PostY₁₀Treated · PreY₁₁Treated · PostΔControl = Y₀₁ − Y₀₀ΔTreated = Y₁₁ − Y₁₀DiD = ΔTreated − ΔControl
Figure 1. The four cells of a difference-in-differences design. The DiD estimate is the difference between the two row-wise differences: how much the treated group changed minus how much the control group changed.

Both row-wise differences absorb the stable baseline gap between the two groups (the column means cancel within each row). Their difference also absorbs the common time shock (the row means cancel within each column). What remains is exactly the part of the post-treatment change in the treated group that the control group did not experience.


Visualizing DiD: Three Stories, One Picture

The pedagogical power of DiD is best seen by toggling between the three comparison frames in Figure 2. The same four numbers support three very different decisions depending on which comparison you make.

Difference-in-Differences vs. Naive Causal Comparisons

Difference-in-Differences (Subtracts the control group's trend to isolate the true causal app lift of +20)

Treated Stores (West) Control Stores (East)
PRE-TREATMENT (Months 1-3)POST-TREATMENT (Months 4-6)90k100k (Control)General Shock (+10k)100k (Treated)130k (Actual)110k (Counterfactual)General Trend (+10k)Causal DiD Lift (+20k)

The Math: $$\text{DiD Effect} = (130\text{k} - 100\text{k}) - (100\text{k} - 90\text{k}) = 30\text{k} - 10\text{k} = 20\text{k}$$By subtracting the general seasonal trend of +$10k$ (captured by the untreated East Coast control stores) from the total observed sales growth of +$30k$ in the treated West Coast stores, we isolate the true, unconfounded impact of the loyalty program rollout.

Figure 2. Difference-in-differences with parallel trends. Toggle through the three views. The first two are the naive comparisons that overstate the effect; the third constructs the counterfactual path the treated group would have followed without treatment — the dashed line that the control group's slope projects forward — and the gap between that and the observed post-treatment value is the DiD estimate.

The dashed projected line is the heart of the design: it is the counterfactual for the treated group, built from the control group's trend. DiD's identifying assumption is that this projection is credible.


DiD as a Regression

The 2×2 view is intuitive, but most production estimates of DiD use a regression, because regressions extend gracefully to many regions, many time periods, and additional controls.

The two-way regression is

Difference-in-differences regression

Yit  =  β0+β1Treatedi+β2Postt+β3(Treatedi×Postt)+εitY_{it} \;=\; \beta_0 + \beta_1 \,\text{Treated}_i + \beta_2 \,\text{Post}_t + \beta_3 \,(\text{Treated}_i \times \text{Post}_t) + \varepsilon_{it}

where

  • Treatedi=1\text{Treated}_i = 1 if unit ii is in the treated group,
  • Postt=1\text{Post}_t = 1 if period tt is after the treatment date,
  • β3\beta_3 is the coefficient on the interaction — and the DiD estimate.

To see why β3\beta_3 is the DiD, write out the expected outcome in each of the four cells and subtract differences of differences:

CellExpected outcome
Control · Preβ0\beta_0
Control · Postβ0+β2\beta_0 + \beta_2
Treated · Preβ0+β1\beta_0 + \beta_1
Treated · Postβ0+β1+β2+β3\beta_0 + \beta_1 + \beta_2 + \beta_3

The change for the control group is β2\beta_2. The change for the treated group is β2+β3\beta_2 + \beta_3. The difference of differences is

DiD  =  (β2+β3)β2  =  β3\text{DiD} \;=\; (\beta_2 + \beta_3) - \beta_2 \;=\; \beta_3

The interaction coefficient absorbs both the row difference (treated minus control) and the column difference (post minus pre), leaving only the part that lives in the treated-post cell alone.


DiD works because the treated and control groups would have followed parallel trends in the absence of treatment. The two pre-treatment levels are allowed to differ; the two pre-treatment slopes are not.

The visual test is plotting both groups' pre-treatment trajectories on the same chart. If they look parallel for several periods before the treatment date, the assumption is credible. If the treated group was already accelerating relative to the control group before the treatment hit, the design is in trouble — that pre-existing momentum will be mistaken for treatment effect.

The most useful pre-treatment plot is the event study: align all units to event time (treatment date = 0), plot the average treated–control gap in each pre and post period, and check whether the pre-period gaps are flat. We will see event-study plots throughout the next chapters; for now, the principle is just: if your pre-trends are not parallel, you do not have a DiD design — you have a story.


When DiD Is Not Enough: Looking Ahead

DiD assumes you have a group of control units whose averaged trajectory is a credible counterfactual. Two things stretch that assumption in real settings:

  • There is only one treated unit. A state passes a unique policy; a firm pilots in a single city. The classical DiD does not naturally pick a "best" comparison — and ad-hoc matching is fragile. Synthetic control (Chapter 7.2) addresses this by building a custom weighted counterfactual.
  • Treatment effects vary across units. The same rollout helps low-income customers a lot and loyal customers not at all. The average effect understates the targeting opportunity. Heterogeneous treatment effects (Chapter 7.3) tackle this directly.

DiD remains the right starting point for most multi-unit rollouts. The next two articles handle the cases where it isn't enough.