§0.1

Data, Storage, Use, and the Decision Loop

Data begins before it is a table. It begins as ordinary life: a customer searches for a product, walks into a store, taps a card, opens an app, ignores an offer, leaves a review, calls support, uploads a receipt, asks a chatbot a question, or cancels a subscription. Modern businesses are covered in these traces. Some are clean rows. Some are messy sentences. Some are images, audio, locations, documents, or model outputs. All of them are partial records of something a person, machine, or organization did.

This opening chapter draws the whole map before Part I zooms in. It moves through four questions in turn — where business data comes from, how it is stored, how it is used, and how those pieces connect into a single data-to-decision loop. The order matters. A manager should meet the modern business data system first, and the methods later, as tools that serve specific decisions inside that system.


Where Data Comes From

The executive question: what business activity generated this data?

The first managerial skill is not choosing a model. It is asking what the data is a trace of. A sales row is a trace of a completed transaction, not of all customer demand. A click is a trace of attention inside one interface, not of preference in general. A support ticket is a trace of a problem serious enough to report, not of every problem customers experienced. A prompt log is a trace of an AI workflow being used, not proof that the workflow helped.

This matters because every data source carries the bias of the process that created it. If the process changes, the data changes even when the underlying business does not. If the app redesign makes the return button harder to find, return requests may fall while dissatisfaction rises. If a chatbot deflects simple questions, the support tickets that remain will look more severe. If a store starts scanning loyalty IDs more consistently, "repeat customer" metrics may jump without any real change in loyalty.

Where business data comes from

Customer behavior
Purchases, clicks, searches, visits, returns, ratings, reviews
Business use: Demand, loyalty, churn, product-market fit
Business operations
Inventory, invoices, CRM records, shipments, staffing, contracts
Business use: Margin, service quality, capacity, working capital
Digital systems
App events, web logs, ad auctions, recommendation impressions
Business use: Funnels, personalization, attribution, experimentation
Physical world
Sensors, location, cameras, store traffic, delivery scans
Business use: Utilization, loss prevention, routing, field execution
Human language
Support tickets, chats, call transcripts, emails, documents
Business use: Customer voice, compliance, knowledge retrieval, workflow routing
AI workflows
Prompts, responses, citations, tool calls, evals, human review
Business use: Automation quality, risk controls, continuous improvement

Data is usually a trace of work that already happened. The trace can be useful, but it is never the whole reality.

Figure 1. Business data is generated by many kinds of activity. The manager reads each source by asking what work created the record and what decision it can support.

Figure 1 gives the broad map. Six source families cover most data a modern manager will encounter: customer behavior, business operations, digital systems, the physical world, human language, and AI workflows. The source family matters because it tells you what kind of claim the data can support.

A day in the life of data

Consider one Bean & Basket customer on a Tuesday morning.

At 7:48 a.m., Maya searches the mobile app for "oat latte." That creates a search event. At 7:49, the app shows her a seasonal drink recommendation. That creates an impression. She taps it, adds a pastry, applies a loyalty reward, and checks out. That creates cart, transaction, payment, promotion, and loyalty records. Her order is prepared at Store 104, where the point-of-sale system updates inventory and the kitchen display records fulfillment time. Her phone location confirms she picked up the order. At 9:10, she rates the experience four stars and writes, "Great drink, long wait." That creates a rating and review text. Later, the operations team uses that review in a text dashboard, the marketing team uses the transaction in a churn model, the product team uses the search event to tune recommendations, and the regional manager sees the wait-time issue in a KPI dashboard.

One morning produced many records. None of them is "the customer" in full. Each is a trace from a specific system, with a specific purpose and a specific blind spot.

Table 1. One customer morning becomes several business records. The managerial question changes with the source.
TraceLikely recordQuestion it can supportWhat it misses
Search for oat latteapp_search_eventsWhat are customers trying to find?Needs not expressed through search
Recommendation shownrecommendation_impressionsWhich offers receive attention?What would have happened without the recommendation
Completed purchasetransactionsWhat did customers buy, where, and when?Customers who considered but did not buy
Long waitfulfillment_timeWhere is the operating process slow?Subjective tolerance for waiting
Four-star reviewreviewsWhat language do customers use to describe the experience?Silent customers who never leave reviews
AI summary used by supportai_workflow_logsIs the AI workflow accurate, grounded, and useful?Errors not caught by human review

The lesson is practical: do not call these records "customer data" as if they were interchangeable. The search event, transaction, review, wait-time record, and AI log are different slices of the customer's interaction with the firm. They become powerful only when the manager knows which slice is being used.

The source changes the claim

The same managerial topic can look different depending on where the data came from.

Take customer satisfaction. A firm might measure it through star ratings, review text, support tickets, refund requests, call transcripts, social posts, survey responses, churn, or repeat purchase. These are not redundant measures of the same thing. They capture different moments in the customer journey.

  • Star ratings are easy to monitor but shallow.
  • Review text explains reasons but overrepresents people willing to write.
  • Support tickets reveal operational problems but only after the customer escalates.
  • Refunds and returns capture costly dissatisfaction but miss silent disappointment.
  • Churn is a final outcome, often too late for diagnosis.
  • AI summaries can scale interpretation but must be evaluated against source evidence.

The managerial question is not "which data source is best?" The question is: which source is closest to the decision we need to make, and what bias does its generation process introduce?

Three generation traps

Trap 1: activity bias. Data overrepresents people who act inside the measured system. App data overrepresents app users. Reviews overrepresent people motivated to write. Loyalty data overrepresents identified customers. The unmeasured population may behave differently.

Trap 2: workflow bias. Data changes when the business process changes. A new refund policy, a redesigned app, a chatbot handoff rule, or a new sales script can change recorded behavior without changing underlying demand or satisfaction.

Trap 3: AI feedback bias. AI workflows create new records and change the behavior that future models learn from. If an AI support assistant routes some complaints away from human agents, the remaining ticket data no longer represents the full complaint mix. If a recommender shows the same products repeatedly, future purchase data reflects exposure as much as preference.

These traps are not reasons to avoid data. They are reasons to read data as a product of its generating process.

A trace only becomes usable once it lands somewhere. That makes the next question structural: where does the data live, and what was that system built to do?


How Data Is Stored

The word "database" hides too much. The system that records a customer's payment is not built for the same job as the system that scans five years of transactions for a pricing analysis. The place that stores raw app logs is not the same as the place that supports semantic search over policy documents. A manager does not need to administer these systems, but does need to understand their roles. Otherwise every data conversation becomes vague: "Can we get the data?" Which data? From which system? For what decision? At what latency? With what quality contract?

The executive question: what job is this data system doing?

Modern firms usually store data in several layers. Each layer optimizes for a different job.

An operational database records the next event correctly: a payment, an order, a login, a shipment, a service case. It is built for reliability, identity, permissions, and fast small updates. An analytical database scans many past events: a year of transactions, a panel of stores, a customer cohort, a product assortment, a marketing funnel. It is built for aggregation, history, and comparison. The distinction is not technical trivia. It determines whether the system is meant to run the business or analyze the business.

The storage stack is a division of labor

Source systems
Ingestion
Storage
Transform
Metrics, models, AI
Decision
SystemPrimary jobCommon examplesManagerial question
Operational SQLRun the applicationOrders, accounts, payments, POS, CRMCan the business record the next transaction correctly?
NoSQL and searchServe flexible app dataDocuments, sessions, profiles, product catalogs, keyword searchCan the app retrieve the right object quickly?
Lake and filesKeep raw and semi-raw assetsLogs, parquet files, PDFs, images, audio, vendor dropsCan the firm preserve data before every use is known?
Warehouse or lakehouseAnswer analytical questionsSnowflake, BigQuery, Databricks-style lakehousesCan managers scan history across customers, products, and time?
Local analyticsLet one analyst work quicklyDuckDB, notebooks, local parquet, reproducible extractsCan a small team investigate without waiting on production systems?
Vector and graph storesFind meaning and relationshipsEmbeddings, semantic search, RAG indexes, product/customer graphsCan the workflow retrieve related ideas, documents, or entities?

The practical distinction is transactional versus analytical: one system records the next event; another scans many past events to support a decision.

Figure 2. The modern storage stack is a division of labor. Each system class stores a different kind of evidence for a different kind of decision.

Figure 2 is the practical map. Operational SQL, NoSQL, lakes, warehouses, local analytical engines, vector databases, graph stores, and search indexes are not competing names for the same thing. They are specialized pieces of a workflow that moves from source activity to decision.

Transactional versus analytical

The most important distinction is transactional versus analytical.

A transactional system answers: can we record and retrieve one business event correctly right now? The point-of-sale system must know the price, charge the customer, update inventory, and create a receipt. The CRM must record a sales interaction. The app database must know which user is logged in. Mistakes here interrupt the business.

An analytical system answers: what pattern emerges across many business events? The warehouse computes weekly revenue by region, demand by product, churn by cohort, margin by promotion, and service quality by store. It is not trying to record the next transaction. It is trying to make history comparable.

Table 2. Transactional and analytical systems answer different questions. Confusing them creates slow tools, fragile reporting, and mistrusted numbers.
DimensionTransactional systemAnalytical system
Primary jobRecord the next event correctlyCompare many past events
Typical questionsDid this order, payment, or login succeed?Which customers, stores, products, or periods are changing?
Data shapeCurrent records, normalized entities, app stateHistory, panels, aggregates, derived metrics
LatencyImmediate or near-immediateBatch, near-real-time, or streaming depending on the use case
Failure modeThe business cannot operateThe organization makes decisions from stale or inconsistent evidence

Managers feel this distinction in ordinary meetings. When the CFO asks for margin by promotion over the past six quarters, the answer should not come from the live checkout database. When customer support needs the current status of an order, the answer should not wait for the nightly warehouse refresh. Each system can be excellent and still be wrong for the job.

The major storage roles

SQL operational databases store structured app and business records: customers, orders, payments, products, subscriptions, tickets. They usually enforce relationships and consistency. If one customer has many orders, SQL is good at keeping that relationship explicit.

NoSQL systems serve flexible or high-scale application data: product catalogs, session state, user profiles, event payloads, documents, and other records whose structure changes often. They are often useful when the application needs fast reads and writes over flexible objects.

Data lakes and object storage keep raw or semi-raw assets: logs, vendor files, parquet tables, documents, images, audio, and historical extracts. The lake is useful when the firm wants to preserve data before every analytical use is known.

Warehouses and lakehouses make history analyzable. Systems such as Snowflake, BigQuery, and Databricks-style lakehouses are used to scan large historical datasets, join source systems, define metrics, and support dashboards, notebooks, and model training.

DuckDB-style local analytics gives analysts a fast, lightweight way to work with serious data on a laptop or in a reproducible script. This is useful for teaching, prototyping, case packs, and focused investigation before work becomes shared infrastructure.

Search, vector, and graph systems support retrieval and relationships. Keyword search finds exact or near-exact terms. Vector databases store embeddings so workflows can retrieve semantically related documents, products, customers, or images. Graph stores represent relationships such as referrals, product co-purchases, supply chains, account networks, and organizational structures.

The point is not to memorize product names. The point is to ask which system is doing which job.

Batch, streaming, and freshness

Data also differs by freshness.

Some workflows are fine with a nightly refresh. A weekly executive KPI dashboard, a monthly pricing review, or a quarterly market expansion analysis does not need every transaction within seconds. Other workflows need near-real-time data: fraud detection, stockout alerts, delivery routing, ad bidding, anomaly detection, or a customer-facing recommendation shown during a session.

Freshness has a cost. Real-time systems are harder to build, harder to monitor, and easier to over-trust. A manager should ask: what decision becomes better if this is refreshed sooner? If the action is weekly, minute-level freshness may only create noise.

Table 3. Data freshness should match the decision cadence. Faster is valuable only when someone can act faster.
CadenceExample workflowManagerial test
Daily or weekly batchExecutive KPI dashboard, store performance reviewWill anyone change an action more than once per day or week?
Near-real-timeInventory alert, fraud flag, support escalationDoes a faster signal prevent loss or improve service immediately?
Streaming or session-timeAd bidding, next-best recommendation, live routingIs the decision made during the customer or operational interaction?

How storage affects methods

The rest of the book repeatedly depends on storage choices.

  • Dashboards need stable metric tables, not ad hoc extracts.
  • Causal analysis needs historical data at the right grain, not only summary reports.
  • Prediction needs labels, features, and timestamps aligned in a feature table.
  • Recommenders need exposure logs as well as purchase logs, or they confuse preference with what the system happened to show.
  • Retrieval-augmented generation needs a document store, a search or vector index, source metadata, and a way to evaluate retrieval quality.
  • Governance needs lineage: where the data came from, who owns it, how fresh it is, and what changed since last time.

Storage tells us what is possible to ask. The next question is what the firm actually does with the answer.


How Data Is Used

Data creates value only when it changes a workflow. A dashboard no one reviews is a report, not a management system. A prediction score that triggers no action is a number, not intelligence. A recommendation system that cannot learn from exposure and feedback is guesswork with software around it. A language model that summarizes documents without evaluation is convenience without control. The practical question is always the same: what job is the data doing?

The executive question: what decision workflow will this evidence improve?

Modern data use falls into a small number of recurring workflow families. The names change by industry, but the managerial logic is stable.

Some workflows monitor the business: revenue, margin, churn, conversion, service time, stockouts, quality, risk. Some diagnose where a metric moved: by customer segment, geography, product, cohort, channel, store, or time period. Some learn causally: did a price change, campaign, policy, or process change cause an outcome? Some predict: which customers will churn, which orders are risky, how much demand should we expect? Some rank and recommend: what should be shown first, who should be contacted first, which action should be suggested next? Some read unstructured work: tickets, calls, documents, images, contracts, resumes, invoices, policy manuals. Some optimize a constrained action: staffing, inventory, routing, pricing, media spend, assortment, or scheduling.

Use-case router: business question to evidence workflow

What is happening?
Monitoring and KPI dashboards
Metric card, alert, scorecard
Parts I-II
Where and for whom?
Segmentation, cohorts, drilldowns
Segment profile, cohort view, diagnostic dashboard
Parts II-IV
Did our action cause it?
Experiments and causal designs
Identification memo, lift chart, effect estimate
Part III
What is likely next?
Prediction, forecasting, risk scoring
Predictive task contract, model card
Part IV
What should we show first?
Ranking and recommendation
Ranked list, threshold rule, monitoring view
Part IV
What does the text, document, or image say?
Extraction, search, RAG, AI-assisted workflows
AI workflow card, eval dashboard, review queue
Part V

The same source data can support several workflows. The manager's first job is to route the question before choosing the method.

Figure 3. The use-case router. Start with the business question, then choose the workflow, evidence asset, and book home that fits it.

Figure 3 is one of the book's central habits. Before debating tools, route the question.

Monitoring: what is happening?

Monitoring is the most familiar use of data. The firm defines metrics, refreshes them on a cadence, and watches for movement. KPI dashboards, scorecards, alerts, and operating reviews all live here.

Monitoring is useful when:

  • the metric is clearly defined;
  • the owner knows what action they can take;
  • the refresh cadence matches the action cadence;
  • the dashboard separates normal variation from signals that require attention;
  • the view supports drilldown when a metric moves.

Monitoring fails when the dashboard becomes a collection of charts without a decision path. A useful dashboard answers three questions in sequence: what moved, where did it move, and what should we inspect or do next?

Diagnosis: where, for whom, and why might it be happening?

Diagnosis starts after monitoring notices movement. If weekly margin falls, the manager needs to know whether the issue is a product, store, region, customer segment, campaign, channel, or supply problem. This is where segmentation, cohorts, funnels, small multiples, maps, and drilldowns matter.

Diagnosis does not prove causality. It narrows the search. It tells the team where to investigate and what comparison might be useful.

Table 4. Monitoring and diagnosis are different workflow stages. A dashboard should support both without confusing them.
StageQuestionDefault evidenceCommon failure
MonitorWhat changed?KPI trend, alert, scorecardNo action owner or threshold
DiagnoseWhere did it change?Segment drilldown, cohort, small multiplesTreating a pattern as proof of cause
DecideWhat should we do?Decision brief, experiment, model, memoJumping from dashboard to action without comparison

Strategic decisions: where should the firm place its bets?

Strategy uses data differently from daily operations. The evidence is often less fresh, more aggregated, and more uncertain. Market expansion, pricing architecture, product portfolio, customer segment focus, channel strategy, capacity investment, and acquisition screening all require a blend of historical data, external context, assumptions, and judgment.

The practical discipline is to separate three things:

  1. Facts from the current business. What do our customers, products, stores, channels, and margins show?
  2. Assumptions about the future. What must be true for the strategy to work?
  3. Tests and signals. What data would tell us early that the strategy is working or failing?

This is why the book returns to decision memos. Strategic decisions need evidence, but they also need an explicit threshold for acting and a plan for learning after action.

Causal learning: did the action work?

Many business questions are causal: did the email cause incremental sales, did the discount lift profit, did the new onboarding flow reduce churn, did the policy change reduce risk? Historical data alone often makes these questions look easier than they are.

The key issue is the counterfactual: what would have happened without the action? Experiments, A/B tests, difference-in-differences, synthetic control, regression with credible identification, and other designs are ways of constructing a comparison that earns the word "caused."

Causal learning is not always required. If the question is only "which stores are currently above target?", a dashboard is enough. If the question is "should we roll out this promotion nationally?", a dashboard is not enough.

Prediction, ranking, and recommendation

Prediction asks what is likely to happen next. Churn models, demand forecasts, fraud scores, lead scores, risk models, and delivery-time predictions all live here. The model does not need to know what caused the outcome to be useful, but it does need a clear action attached to the score.

Ranking and recommendation go one step further. They order choices: which product to show, which customer to contact, which ticket to escalate, which loan to review, which store to visit, which document to retrieve. Ranking systems need extra care because they shape the future data they observe. If a product is never shown, the system cannot learn whether customers would have liked it.

Generative AI and unstructured workflows

Many modern workflows use data that does not look like rows and columns: support tickets, product reviews, policy documents, sales calls, PDFs, images, screenshots, contracts, invoices, code, slides, and emails. AI workflows help classify, extract, summarize, retrieve, draft, route, and monitor this work.

The mature version is not "ask the model." It is a designed workflow:

  1. collect the source material;
  2. retrieve or select relevant context;
  3. ask the model for a bounded task;
  4. require structured output when the result must feed another system;
  5. evaluate accuracy, grounding, bias, privacy, and refusal behavior;
  6. route uncertain or high-risk cases to human review;
  7. monitor the workflow after deployment.

AI is powerful because it makes language, documents, and images operational. It is risky for the same reason: it can make weak evidence look fluent.

Optimization: what should we do under constraints?

Optimization turns predictions, rules, and business constraints into an action plan. How many employees should be scheduled? Which stores should receive inventory? How should delivery routes be assigned? Which media channels should receive budget? Which price should be offered under margin and fairness constraints?

Optimization is often where analytics becomes real. It also exposes hidden objectives. Are we optimizing revenue, margin, customer satisfaction, utilization, fairness, retention, or risk? If the objective is wrong, the optimized answer is wrong with confidence.

Monitoring, diagnosis, causal proof, prediction, and AI workflows are not separate islands. They are stages of one loop — the loop that connects every part of this book.


The Data-to-Decision Loop

The modern data system is a loop. Human and machine activity creates records. Records move into storage. Storage feeds transformations, metrics, charts, models, search indexes, and AI workflows. Those evidence assets inform decisions. Decisions change customer experience, operations, pricing, policy, product, staffing, or automation. Those changes create new data. The loop starts again.

The executive question: where are we in the data-to-decision loop?

Most analytics mistakes come from losing track of the loop. A team starts with a model but has not named the action. A dashboard monitors a metric but has no threshold. A causal analysis estimates an effect but does not connect to a decision cadence. An AI workflow answers questions but has no evaluation or escalation path. Each failure is a broken link between data and action.

The data-to-decision loop

Step 1
Human or machine activity
A customer acts, a process runs, a model responds
Step 2
Source record
A transaction, log, ticket, document, image, or prompt trace
Step 3
Storage and transformation
Operational DB, lake, warehouse, feature table, vector index
Step 4
Evidence asset
Metric, chart, causal estimate, prediction, retrieval result
Step 5
Decision and action
A manager changes a price, offer, process, policy, or workflow
Step 6
Feedback and monitoring
The action creates new data and the loop starts again
The loop is circular, not linear. Every decision changes the business, and that changed business generates the next round of data.
Figure 4. The data-to-decision loop. Every decision changes the business, and the changed business generates the next round of evidence.

Figure 4 is the front-door operating model for the book. Part I begins in the source record. Part II turns records into visual evidence. Part III asks whether an action caused an outcome. Part IV predicts and ranks future cases. Part V brings text, documents, images, embeddings, and language models into the loop. Part VI asks how the organization runs the loop repeatedly without losing ownership, quality, or governance.

Data-driven versus data-decorated

A decision is data-driven only when three things are named:

  1. The action. A specific lever someone can pull.
  2. The comparison. What would happen if the firm did not act or acted differently.
  3. The threshold. The signal, effect size, ROI, quality level, or risk standard required to act.

Anything missing one of these may still be useful description. It may be a good dashboard, a good analysis, or a promising model. But it is not yet a decision.

Table 5. Data-decorated work often has impressive evidence but a missing decision link.
Failure patternWhat it looks likeMissing link
Dashboard without actionWeekly KPI review shows a metric fallingNo owner, threshold, drill path, or response playbook
Causal claim without counterfactualCustomers who received an email spent moreWhat those same customers would have spent without the email
Prediction without workflowA churn score ranks customers every MondayWhich offer, threshold, queue, and follow-up action the score triggers
AI without evaluationA chatbot answers from company documentsGrounding tests, refusal rules, escalation, monitoring, and ownership

The decision ladder

The book's evidence languages climb a ladder of business questions. The lower rungs do not disappear when the upper rungs arrive. A prediction model still depends on a clean target. A causal design still depends on a well-defined unit, timing, and outcome. A language model workflow still needs source data, storage, evaluation, monitoring, and human review.

The decision ladder

IDescription
What happened?
IIVisual comparison
Where & for whom?
IIICausal designs
What caused it?
IIIRegression / elasticity
How much does X matter?
IVPrediction
What is likely next?
VAI workflows
What does the text/image say?
VISystem view
How do we operate this?
Figure 5. The decision ladder. Each rung asks a different business question and requires a different evidence language.

The ladder is a routing device:

  • What happened? Read the data and define the metric.
  • Where and for whom? Visualize, segment, compare, and diagnose.
  • What caused it? Construct a credible counterfactual.
  • How much does the lever matter? Estimate effects, elasticity, and heterogeneity.
  • What is likely next? Predict, rank, and evaluate.
  • What does the text, image, or document say? Use unstructured data and AI workflows.
  • How do we operate this? Monitor, govern, communicate, and learn.

Six evidence languages

Six evidence languages, one per Part

Decision questionEvidence languagePartStudio
What happened?Description, metricsIData Language Studio (§4.1)
What should the eye see first?Visual evidenceIIVisual Decision Brief (§8.2)
What caused it?Causal designsIIIPricing & Promotion (§13.4)
What is likely next?Prediction & segmentationIVCustomer Intelligence (§17.4)
What does the text or image say?AI workflowsVCustomer Voice Intelligence (§22.2)
How do we operate this?System viewVIFinal Integrative Case (§25.1)

Each Part teaches one evidence language and ends with a Studio that ships its capstone artefact.

Figure 6. The evidence languages. Each Part of the book adds one way of turning data into decision-relevant evidence.

The important point is not the numbering. It is the discipline of choosing the evidence language that fits the decision. A dashboard should not be asked to prove causality. A predictive model should not be treated as an intervention. A language model should not be trusted because it is fluent. A causal estimate should not ship if no one knows what action it changes.

The artifacts that survive the work

The book does not end each method with "and now you know the method." It ends with an artifact that a firm can reuse, audit, and refresh.

The artefact family — five one-page documents that survive the work

Decision Question Card
What action, on what unit, with what counterfactual?
§9.1
Predictive Task Contract
What target, for what unit, on what horizon, with what features?
§14.2
Model Card
What does this model do, where does it fail, who owns it?
§15.5
AI Workflow Card
What does this workflow do, what governs it, who responds?
§22.1
Decision Memo
What is the recommendation, what evidence supports it, what next?
§24.1

Each artefact extends the discipline of the one above. The card you write at §9.1 grows into the memo you sign at §24.1.

Figure 7. The artifact family. These one-page artifacts turn analysis into reusable decision infrastructure.

The artifact family matters because modern analytics is not a sequence of one-off clever analyses. It is infrastructure for repeated decisions. A metric card can be reused in a dashboard. A predictive task contract can be reused by a modeling team. A model card can be used by risk, legal, product, and operations. An AI workflow card can be audited when the workflow changes. A decision memo can show what evidence led to action and how the firm will learn afterward.

The cases

One through-line company, Bean & Basket Coffee, appears throughout the book. Standalone cases add real empirical grounding where a specific method needs a richer dataset.

The case portfolio

Bean & Basket CoffeeThe continuous through-line

A multi-store specialty coffee chain with reviews, tickets, transactions, panel data, campaigns, products, stores, and an internal knowledge base. Appears in every Part.

Standalone case studies
Progresso Soup
Pt II, Pt III
Visual evidence, fixed effects, elasticity
Milk Field Data
Pt III
Quasi-experiment, heterogeneous effects
Zillow Colorado
Pt III
Difference-in-differences, synthetic control
BAV Fast Food
Pt IV
PCA, perceptual maps
Airbnb (illustrative)
Pt IV
Numeric prediction, residuals
Yelp Reviews
Pt V
Sentiment, topics, GPT measurement
Goose Island Twitter
Pt V
Emotion vs. sentiment
Earnings Calls
Pt V
Evasiveness measurement
Job Postings
Pt V
Construct measurement

Standalone cases are appended outside chapter prose. They give the methods a second testing ground beyond the Bean & Basket through-line.

Figure 8. The case portfolio. Bean & Basket provides continuity; standalone cases give specific methods a second testing ground.

The purpose of the cases is not to decorate chapters. It is to make the loop concrete. Every case asks: what activity generated the data, where is it stored, what evidence language fits the decision, what artifact should survive, and how would the organization monitor what happens next?

Where Part I begins

Part I now begins after the full system is visible. We zoom in from the operating loop to the basic object inside it: a dataset. The first question becomes almost physical: what does one row mean? That question sounds small, but it controls everything that follows. Grain, structure, variable type, joins, reshaping, metrics, and data quality are the mechanics that make the larger loop trustworthy.