§7.2

Synthetic Control

The most strategically important decisions a firm makes are often executed in a single market: a new store format in one city, an algorithmic rollout in one country, a regulatory change in one state. Standard difference-in-differences breaks down here. There is no natural pool of treated units to average; picking one comparison unit is subjective and easy to manipulate. Synthetic control was designed for exactly this case: it builds the counterfactual by combining several donor units into a weighted twin that tracks the treated unit before the intervention, and uses that twin to estimate what would have happened afterwards.

This chapter develops the synthetic control idea visually, derives the optimization that fits the donor weights, names the assumption that makes the estimate causal, and walks through a worked case on state-level housing data.

The Executive Question: Can We Build a Credible Twin?

The previous chapter left this problem unresolved: with only one treated unit, difference-in-differences has no natural control group to average, and the naive fix — averaging every other state or market into one crude comparison — was shown to collapse under its own weight. Synthetic control is the principled way to build that comparison: instead of one average or one hand-picked twin, it constructs a weighted blend of donors chosen to match the treated unit's own pre-treatment path.

A team launches a new store format in Denver and reports that the Denver store's transactions are now 15% above their pre-launch average. The next question, the right question, is:

How much of the 15% gain would have happened in Denver anyway, given national coffee trends, regional population growth, and local economic conditions?

The obvious moves are all bad. Use the rest of the country as a control: Denver is too small a slice. Pick one comparison city: the choice is subjective and can be manipulated. Pick Portland and the gap looks huge; pick Chicago and it looks tiny. Pick a "natural twin" by hand and the audience will ask, reasonably, why that one.

Synthetic control turns the choice of comparison into an optimization. Instead of one city, it builds a weighted combination of several untreated cities — Denver is some specific blend, say, of Portland, Austin, Seattle, and Salt Lake City — chosen so the synthetic Denver tracks the real Denver as closely as possible before the new format launched. After the launch, the gap between the actual Denver and synthetic Denver is the design's estimate of the effect.

The Method, in Pictures

The idea fits on one diagram.

Synthetic control: a weighted combination of donor units

Figure 1. Synthetic control as a three-step pipeline. A pool of untreated donor units feeds an optimizer that picks non-negative weights summing to one. The weighted combination of donors becomes a synthetic counterfactual that should track the treated unit's pre-treatment path.

A few features of the diagram are intentional and worth naming:

The weights are non-negative and sum to one. The synthetic counterfactual is a convex combination of donor units. It can never be more extreme than the most extreme donor, which protects against the algorithm finding a spurious match by extrapolating wildly.
Most donor weights are zero. Synthetic control is naturally sparse. Out of dozens of candidate donors, the algorithm typically picks four to six with meaningful weights. The rest get nothing.
The matching happens in the pre-treatment period only. No information from the post-treatment period enters the weight selection. That separation is what makes the post-treatment gap interpretable as a causal effect.

The Method, in Math

Let the treated unit be indexed $j = 1$ and let $j = 2, \dots, J+1$ index the donor pool. Observe outcomes $Y_{jt}$ for periods $t = 1, \dots, T$ , with the treatment occurring at time $T_0$ . Let $W = (w_2, \dots, w_{J+1})$ be the weight vector with constraints

Weight constraints

w_j \;\ge\; 0 \quad \text{and} \quad \sum_{j=2}^{J+1} w_j \;=\; 1

The optimizer picks $W$ to minimize the pre-treatment fit error between the treated unit and the weighted donor combination:

Synthetic control objective

\min_{W} \; \sqrt{\frac{1}{T_0} \sum_{t=1}^{T_0} \left(\, Y_{1t} - \sum_{j=2}^{J+1} w_j Y_{jt} \,\right)^2}

Once the optimal weights $W^*$ are in hand, the synthetic counterfactual for any post-treatment period is

\widehat{Y}^{0}_{1t} \;=\; \sum_{j=2}^{J+1} w_j^* \, Y_{jt}

and the estimated treatment effect at time $t > T_0$ is the gap

Synthetic control effect

\widehat{\tau}_{1t} \;=\; Y_{1t} - \widehat{Y}^{0}_{1t}

Two design choices in the objective are worth highlighting. The constraints prevent extrapolation: synthetic Denver cannot be 5 × Portland minus 2 × Boise, even if that combination would have fit pre-trends perfectly. And the optimization runs only on pre-treatment data: nothing about the post period is allowed to influence the weights.

Reading a Synthetic Control Plot

A credible synthetic control study presents three figures, in this order:

The path comparison. Actual treated unit and synthetic counterfactual on the same chart over the full window. The visual check is whether the two paths overlap in the pre-period; that is the only evidence that the weights produced a reasonable twin.
The donor weights. A bar chart showing which donor units received non-zero weights. The audience needs to be able to ask, "do these donors make business sense?"
The gap. The path difference (or percent gap) between actual and synthetic over time. The pre-period should hover near zero; the post-period should show the effect.

Any synthetic control deck that skips one of these three is hiding something the other two would reveal.

Data Case: Colorado Housing

To see the method on real data, we apply it to Colorado's January 2014 cannabis legalization. The unit is the state-month; the outcome is the Zillow Home Value Index (ZHVI); the donor pool is the set of US states that did not change cannabis policy in this window. The pre-treatment period is twelve years (2002–2013); the post-treatment period is the six years following.

Step 1 — the pre-treatment fit

Colorado separates from its synthetic comparison after 2014

Pre-period fit uses 216 months before 2014-01-31.

Figure 2. Actual Colorado ZHVI (blue) vs. the optimized synthetic counterfactual (orange). The two paths overlap tightly from 2002 through 2013, demonstrating that the donor mix successfully reproduced Colorado's pre-treatment housing dynamics. After January 2014, the two paths separate.

The tightness of the pre-2014 overlay is the design's foundation. If those lines diverged before the treatment, we would have no basis for trusting that they would have continued to track after it.

Step 2 — the donor weights

The synthetic Colorado is mostly Kansas, Massachusetts, Utah, and Michigan

Weights are constrained to be nonnegative and sum to one.

Kansas41.9%

Massachusetts35.3%

Utah12.5%

Michigan10.3%

Figure 3. Donor weights selected to build the synthetic Colorado. A handful of states absorb almost all the weight; most candidate donors are assigned zero. Audit this list for business plausibility before reading the gap.

The synthetic Colorado is built primarily from four states. Each of them contributed because the optimizer found that, in combination, they tracked Colorado's pre-2014 dynamics. The audit question is whether each one is a defensible part of that counterfactual — and whether any one of them might have been affected by Colorado's policy itself (a spillover threat we cannot rule out without further work).

Step 3 — the post-treatment gap

The post-2014 housing gap stays positive

Average post-period gap: 20.2%.

Figure 4. Percent gap between actual Colorado ZHVI and the synthetic counterfactual. The pre-treatment gap hovers around zero; the post-treatment gap opens after January 2014 and averages roughly 20% over the following six years.

The estimated effect — about a 20% lift in housing values over six years — is the design's headline. The defense of that number is the three figures together, not the number alone. A 20% gap is credible because (a) the pre-period fit is tight, (b) the donor weights are plausible and transparent, and (c) the post-period gap opens cleanly at the treatment date rather than gradually before it.

What the design cannot defend against is an unobserved post-2014 Colorado-specific shock that happened to coincide with legalization — a tech boom, a migration wave, a policy bundle adopted alongside cannabis. The identifying assumption is exactly that no such confounding shock occurred, and the only honest answer is to list candidates and let the reader judge their plausibility.