§7.1
Difference-in-Differences
When a firm rolls out a new feature, a price change, or a regional policy, it rarely has the luxury of randomly assigning treatment. Rollouts are regional, gradual, and tangled up with macro trends. Difference-in-differences (DiD) is the workhorse design for these settings. It bypasses the two naive comparisons every executive deck reaches for first — before vs. after in the treated group, and treated vs. untreated after the rollout — and replaces them with a comparison that nets out both confounders at once.
This article works through the DiD logic visually and algebraically, derives the regression specification that recovers the DiD estimate as an interaction coefficient, and ends with the identifying assumption — parallel trends — that everything else hinges on.
The Executive Question: Did We Grow Faster Than the Tide?
A regional team launches a new mobile checkout feature in the West Coast stores. Weekly transactions in West Coast stores rise from 100,000 to 130,000 — a thirty-thousand-transaction gain. The team prepares to recommend a national rollout.
Before approving, the executive question:
How much of the 30,000-transaction increase was actually caused by the feature, and how much would have happened anyway?
Over the same period, untreated East Coast stores rose from 90,000 to 100,000. Ten thousand of the West Coast's gain was the tide rising for everyone. The remaining twenty thousand is the part the feature can plausibly claim.
That subtraction-of-subtractions is the DiD estimator. The 2×2 below shows where it sits relative to the two naive comparisons.
| Comparison | Calculation | Implied effect | What it confounds with the effect |
|---|---|---|---|
| Naive before/after (West only) | West Post − West Pre = 130k − 100k | +30k | Common seasonal/macro tide affecting both regions |
| Naive cross-section (Post only) | West Post − East Post = 130k − 100k | +30k | Stable baseline difference between the two regions |
| Difference-in-differences | (130k − 100k) − (100k − 90k) | +20k | Neither (under parallel trends) |
The 2×2 Picture
The cleanest way to understand DiD is to memorize the 2×2.
Difference-in-differences as a 2×2 comparison
Both row-wise differences absorb the stable baseline gap between the two groups (the column means cancel within each row). Their difference also absorbs the common time shock (the row means cancel within each column). What remains is exactly the part of the post-treatment change in the treated group that the control group did not experience.
Visualizing DiD: Three Stories, One Picture
The pedagogical power of DiD is best seen by toggling between the three comparison frames in Figure 2. The same four numbers support three very different decisions depending on which comparison you make.
Difference-in-Differences vs. Naive Causal Comparisons
Difference-in-Differences (Subtracts the control group's trend to isolate the true causal app lift of +20)
The Math: $$\text{DiD Effect} = (130\text{k} - 100\text{k}) - (100\text{k} - 90\text{k}) = 30\text{k} - 10\text{k} = 20\text{k}$$By subtracting the general seasonal trend of +$10k$ (captured by the untreated East Coast control stores) from the total observed sales growth of +$30k$ in the treated West Coast stores, we isolate the true, unconfounded impact of the loyalty program rollout.
The dashed projected line is the heart of the design: it is the counterfactual for the treated group, built from the control group's trend. DiD's identifying assumption is that this projection is credible.
DiD as a Regression
The 2×2 view is intuitive, but most production estimates of DiD use a regression, because regressions extend gracefully to many regions, many time periods, and additional controls.
The two-way regression is
Difference-in-differences regression
where
- if unit is in the treated group,
- if period is after the treatment date,
- is the coefficient on the interaction — and the DiD estimate.
To see why is the DiD, write out the expected outcome in each of the four cells and subtract differences of differences:
| Cell | Expected outcome |
|---|---|
| Control · Pre | |
| Control · Post | |
| Treated · Pre | |
| Treated · Post |
The change for the control group is . The change for the treated group is . The difference of differences is
The interaction coefficient absorbs both the row difference (treated minus control) and the column difference (post minus pre), leaving only the part that lives in the treated-post cell alone.
The Identifying Assumption: Parallel Trends
DiD works because the treated and control groups would have followed parallel trends in the absence of treatment. The two pre-treatment levels are allowed to differ; the two pre-treatment slopes are not.
The visual test is plotting both groups' pre-treatment trajectories on the same chart. If they look parallel for several periods before the treatment date, the assumption is credible. If the treated group was already accelerating relative to the control group before the treatment hit, the design is in trouble — that pre-existing momentum will be mistaken for treatment effect.
The most useful pre-treatment plot is the event study: align all units to event time (treatment date = 0), plot the average treated–control gap in each pre and post period, and check whether the pre-period gaps are flat. We will see event-study plots throughout the next chapters; for now, the principle is just: if your pre-trends are not parallel, you do not have a DiD design — you have a story.
When DiD Is Not Enough: Looking Ahead
DiD assumes you have a group of control units whose averaged trajectory is a credible counterfactual. Two things stretch that assumption in real settings:
- There is only one treated unit. A state passes a unique policy; a firm pilots in a single city. The classical DiD does not naturally pick a "best" comparison — and ad-hoc matching is fragile. Synthetic control (Chapter 7.2) addresses this by building a custom weighted counterfactual.
- Treatment effects vary across units. The same rollout helps low-income customers a lot and loyal customers not at all. The average effect understates the targeting opportunity. Heterogeneous treatment effects (Chapter 7.3) tackle this directly.
DiD remains the right starting point for most multi-unit rollouts. The next two articles handle the cases where it isn't enough.