§5.1

From Metrics to Decisions

I
What happened?
II
Where & for whom?
III
What caused it?
III
How much does X matter?
IV
What is likely next?
V
What does the text/image say?
VI
How do we operate this?

Part III — separating what we saw from what we caused.

A business metric becomes decision-ready only when it is tied to an action. Dashboards overflow with descriptive summaries — revenue up four percent, churn up two points, private-label share at a new high. These numbers are valuable for monitoring the health of the firm, but they are fundamentally passive. They tell a manager what happened. They are silent on why it happened, and they cannot answer the central question of leadership: what should we do next?

Part III is about closing that gap. Every method in the next five chapters — experiments, regression, fixed effects, difference-in-differences, synthetic control, elasticity — is a different way of building a credible comparison between what we did and what would have happened otherwise. This first article is the framing chapter: it introduces the lightweight tool that should sit upstream of every causal analysis, the Decision Question Card, and lays out the three case families we will return to across Part III.


The Executive Question: What Action Does This Metric Support?

The most common failure in business analytics is launching a modeling project without first defining the lever. A team can spend months building a churn model only to discover that the marketing organization has no retention offer to deploy against the predictions. The model was accurate. The decision was missing.

A useful test: read the question out loud. Does it name a specific action a specific person could take? Compare:

  • Metric-focused: "How are pastry sales performing in retail?"
  • Decision-ready: "If we run a morning push offering a one-dollar pastry discount to coffee-only weekday app users, will gross margin per user exceed our fifteen-cent threshold over the next two weeks, compared with a randomized holdout?"

The first question gestures at a topic. The second one names the lever, the segment, the outcome, the horizon, the comparison, and the threshold that would justify acting. The second one is what we build analytics around.


The Decision Question Card

Before writing a line of SQL or fitting a single model, fill out a six-line card. It is deliberately short — short enough that a manager and an analyst can agree on it in a fifteen-minute meeting.

  1. Action (DD or TT). The specific intervention under managerial control: a price change, a coupon, a feature rollout, a policy shift.
  2. Outcome (YY). The business metric that should move, including any downstream financial outcome (margin, profit, lifetime value) you do not want to sacrifice.
  3. Unit of analysis (ii). The level at which the action is applied and measured: a customer, a store-week, a region-month.
  4. Timing and horizon (tt). When the intervention starts, when it ends, and the window over which the outcome is measured.
  5. Counterfactual comparison. The credible stand-in for what would have happened to the same units in the absence of the action. This is almost always the hardest line of the card.
  6. Decision threshold (θ\theta). The minimum effect that would justify acting, after accounting for cost, risk, and operational overhead.
Table 1. Four descriptive metrics translated into action-ready Decision Question Cards. The same underlying data can support several different cards depending on how the action, unit, and counterfactual are framed.
Descriptive metricActionPrimary outcomeUnitCounterfactual comparison
App conversion rateBreakfast coupon pushGross margin per userApp userRandomized holdout users receiving no push
Retail category volumePrice promotion to a target priceCategory volume and profitStore-weekMatched store-weeks at baseline pricing
Loyalty shareStaggered loyalty program rolloutSustained customer spendStore-monthNot-yet-rolled-out comparable stores
Real-estate price trendState-level policy shiftZillow Home Value IndexState-monthWeighted synthetic twin state

The most common error in this table is not the choice of method, but the choice of unit. If your data warehouse stores transactions but your decision lives at the store-week, analyzing raw transactions silently treats every receipt as an independent observation. Standard errors collapse and the analysis becomes overconfident. The unit of analysis is a decision, not a default.


Why the Counterfactual Is the Whole Game

When a dashboard shows that sales rose after a campaign launched, it is tempting to credit the campaign. That naive conclusion ignores the central equation of decision-making:

Causal lift

Causal Lift=Outcome(Action)Outcome(No action)\text{Causal Lift} = \text{Outcome}(\text{Action}) - \text{Outcome}(\text{No action})

The first term we observe. The second term — what the same units would have done in the absence of the action — is never directly observable. Causal analysis is, in the end, the disciplined construction of a credible stand-in for that missing second term.

Part III walks through the four standard ways to construct it:

  • Randomization. Random assignment forces the two groups to be statistically identical in expectation. The control group is the counterfactual.
  • Regression control. Adjust for observable confounders so that the comparison holds them constant.
  • Difference-in-differences. Compare the change in treated units over time with the change in control units, differencing away stable group-level differences and common time shocks.
  • Synthetic control. Construct a weighted combination of untreated units that tracks the treated unit's pre-intervention trajectory, then use that combination as the counterfactual after the intervention.

Each method makes the counterfactual more credible at the cost of stronger assumptions. The Decision Question Card forces you to name those assumptions up front.


Looking Ahead: The Part III Case Spine

Part III returns repeatedly to three case families so that students develop depth rather than collecting one-off examples. The cases below appear inside the relevant method chapters as clearly labelled data case sections, not as the conceptual material itself.

  1. Milk field quasi-experiment. Scanner data from roughly 1,700 supermarkets. Whole milk is priced flat with low-fat alternatives in some stores and slightly above them in others. We use this case for randomization diagnostics, placebo checks, and heterogeneous treatment effects by ZIP-code income.
  2. Zillow Colorado housing study. State-month Zillow Home Value Index series surrounding Colorado's January 2014 cannabis legalization. A natural setting for difference-in-differences and synthetic control when only one unit is treated.
  3. Progresso soup scanner panel. Weekly transactions across about 2,000 grocery stores. The workhorse for omitted-variable bias, panel regressions with store and week fixed effects, own- and cross-price elasticities, and optimal pricing.

Milk

8.2 pp

higher whole-milk share when milk fat levels are equally priced.

Zillow

20.2%

average post-2014 Colorado gap versus the synthetic comparison.

Soup

-2.23

store fixed-effect own-price elasticity for Progresso volume.

Chapter 9

Counterfactuals

9.1

Milk + soup + Zillow

decision-question card

9.2

Zillow

counterfactual sketch

Chapter 10

Experiments and Bias

10.1

Bean & Basket + milk

experiment readout

10.2

Soup

bias triage

Chapter 11

Regression and Identification

11.1

Soup

regression ladder

11.2

Milk

identification memo

11.3

Soup

fixed-effect interpretation

Chapter 12

Field Designs

12.1

Zillow

before-after comparison

12.2

Zillow

synthetic-control chart

12.3

Milk

segment-effect plot

Chapter 13

Pricing Levers

13.1

Soup

elasticity coefficient plot

13.2

Soup

cross-price heatmap

13.3

Soup

optimal-pricing widget

13.4

All three

executive decision brief

Figure 1. The Part III case spine. Each empirical case is paired with the concepts it will help develop; the headline numbers shown here are the results we will reconstruct in their home chapters.


A Note on the Artefact Family

The Decision Question Card introduced here is the parent of an artefact family that recurs across the book. The same six-line discipline reappears as the Predictive Task Contract (§9.2) for supervised models, the Model Card (§10.5) for deployed predictions, the AI Workflow Card (§16.4) for LLM systems, and the Decision Memo (§17.2) — the one-page synthesis that ties every form of evidence into a recommendation an executive can sign. Each card extends the discipline of the one above. The full family map is in §0.4.