§2.4

Transformations and Business Metrics

The CFO at Bean & Basket asks for "customer lifetime value" by region. Three analysts return three different numbers. None of them is wrong. The first used average order value times an annualized purchase frequency over the last twelve months; the second discounted future revenue by a 10% cost of capital and stopped at five years; the third counted only loyalty members. The CFO has not, in any meaningful sense, asked a single question. He has asked three questions wrapped in one phrase, and each analyst answered a different one. This is the chapter where we acknowledge that almost every business metric is a definition, not a raw column. The work of writing the definition down — what is a "customer," what is "active," what counts as "revenue" — is the work that determines what the dashboard actually says.

The executive question: which business metric are we actually measuring?

Transformations and metrics

There are two operations that turn raw transaction columns into the numbers managers actually look at. The first is the transformation: a mechanical recipe that converts a raw value into a more interpretable one. Logs flatten skew, ratios normalize for scale, lags carry a previous period's behavior into the current row, rolling averages smooth daily noise. The second is the metric: a business definition built out of transformations and aggregations. Churn rate, conversion rate, average order value, gross margin, market share — none of these is a column anyone collects. They are all derived, and the derivation is full of decisions.

Both operations look bureaucratic. Both quietly shape every report downstream. The first time a team agrees, in writing, on what "active customer" means, the next quarter's churn number stops swinging between meetings. The first time someone writes log(revenue) instead of revenue in a regression, the elasticity coefficient becomes interpretable in percentage terms. These are not statistical gestures. They are managerial judgments. Figure 1 is the catalog of transformations every Bean & Basket analyst should be able to apply on demand; Figure 2 is the catalog of metrics every Bean & Basket dashboard should report.

Figure 1. Eight transformations that turn raw columns into managerially useful ones. Each is a small recipe — none requires advanced math — and each encodes a specific business judgment about what comparison is meaningful.

Transformation	Business purpose	Bean & Basket example
Binning	Group noisy continuous values into managerial categories	age → 18–24 / 25–34 / 35–49 / 50+
Log	Make multiplicative changes additive; tame skew	log(revenue) so a 10× store reads as 1 unit larger, not 10
Standardize (z)	Put variables on a common scale to compare	ad spend (std) vs store size (std), both centered, both unit-variance
Ratio	Normalize by scale	revenue per square foot — comparable across store sizes
Rate	Turn counts into probabilities	churn rate = churned / customers_at_risk
Flag (boolean)	Encode a business condition for filtering or grouping	is_loyalty_member = (loyalty_tier ≠ 'None')
Lag	Bring previous-period behavior into the current row	last_month_revenue, last_visit_days_ago
Rolling average	Smooth weekly/daily noise into a trend	7-day moving average of daily sales

Two transformations worth a closer look

Figure 1 names eight transformations, but two deserve a sentence of additional attention because they appear in almost every chapter that follows. The log transform is the one tool that converts a multiplicative business statement into an additive statistical one. When a manager says "this region is twice as big as that one," the log of the two revenues differs by log(2) ≈ 0.69. That is why log-log regressions appear in pricing (Chapter 8): the coefficient is the percentage change in quantity for a one-percent change in price — elasticity reads directly off the model. The rolling average is the small workhorse of executive dashboards. Daily sales are too noisy to interpret; the seven-day rolling mean is the same series with the weekday-cycle smoothed out, and the trend becomes visible.

The transformations in Figure 1 are inputs to the metrics in Figure 2. A churn rate is the boolean churn flag aggregated over a customer-month panel. Revenue per customer is revenue (a sum) divided by active customers (a count of customers passing some flag). Lifetime value is AOV times frequency times retention, three numbers each of which is a transformation of raw orders.

Figure 2. Nine business metrics every executive dashboard quietly depends on. The 'watch out for' column is the part most worth memorizing — every metric here has a hidden definitional choice, and disagreements between dashboards almost always come from a different choice on the same line.

Metric	Formula	What it measures	Watch out for
Revenue	SUM(price × quantity)	Total money received over a period	Returns and refunds should be netted before reporting.
Average order value (AOV)	revenue / orders	Typical purchase size	Skewed by a few large orders — report median alongside mean.
Gross margin	(revenue − cost_of_goods) / revenue	Share of each revenue dollar that survives COGS	Cost allocation choices change the answer; document them.
Conversion rate	conversions / qualifying_visitors	Share of visits that produced a purchase	Definition of 'qualifying visitor' is everything. Bots are not customers.
Churn rate (monthly)	churned_in_month / customers_at_start_of_month	Share of the base that left this month	'Churn' needs a behavioral definition — 'no purchase for 60 days' is a choice.
Repeat purchase rate	customers_with_≥2_orders / total_customers	Share of customers who returned at least once	The window matters: 30 days, 90 days, or lifetime tell different stories.
Revenue per customer (ARPU)	revenue / active_customers	Average value of an active customer in the period	Defining 'active' as 'any purchase' inflates the denominator with one-time buyers.
Customer lifetime value (simple)	AOV × purchase_frequency × expected_retention	Forecast of total revenue from a customer over their tenure	All three inputs are estimates with their own error — report a range, not a point.
Market share	brand_sales / category_sales	Share of category demand captured	Boundary of 'category' is editorial — be explicit about it.

Figure 2 names the metrics that should appear in almost every Bean & Basket review. Look at the right-hand column. Every entry there is a place where two analysts can produce two different numbers from the same raw data — not because anyone made a mistake, but because the metric's definition contains a degree of freedom and the two analysts picked it differently. "Churn" needs a behavioral definition. "Active customer" needs a window. "Conversion rate" needs a notion of qualifying visitor. "Customer lifetime value" needs three estimates, each with their own error bars, multiplied together. The single biggest source of disagreement in real analytics meetings is not the data — it is the metric definition, and the disagreement is usually silent because everyone assumes everyone else's number was computed the same way.

The defense is mechanical: every metric used in a recurring report should have a one-paragraph definition in a shared document, and the formula in the dashboard should point back to that paragraph. "Active customer: any individual with a completed transaction in the last 90 days, regardless of loyalty membership." That sentence is the difference between three analysts producing three CLV numbers and three analysts producing one.

A metric is a contract

There is a more general failure mode in this chapter worth flagging once. A metric is a contract. When a CMO sees "conversion rate: 4.2%" on a slide, she is reading a number that depends on a chain of definitional decisions — what is a visitor, what is a conversion, what is the window, how are bots excluded, are returning visitors deduplicated. Each link in the chain was made by someone, at some point, often without writing it down. The next time the metric is recomputed — by a different analyst, on a different data warehouse, with a slightly different filter — the answer will be different, and the disagreement will be invisible. Treating metric definitions as governance artifacts, not as engineering details, is the only way a multi-year dashboard remains comparable to itself. This is exactly the role of the metric card in the artifact family from Part 0: a one-page, versioned definition that travels with the number wherever it is reported, so a CMO and a data engineer reading the same word mean the same thing.

What to ship

Before asking what is our conversion rate?, ask what does conversion rate mean here? Transformations and metrics are the layer where raw data becomes business language, and the translation is full of editorial choices. The cheapest insurance an analytics team can buy is a metrics dictionary — one paragraph per metric, naming the formula, the window, the unit, and the exclusions — versioned alongside the dashboard that uses it. When a metric moves between reports, the paragraph moves with it. When two reports disagree, the paragraphs are the first place to look. The work of writing those paragraphs is the work of governance; the work of running the SQL is the work of engineering. Confuse them and the numbers will quietly diverge across the organization for years before anyone notices.