§0.1

Data, Storage, Use, and the Decision Loop

Data begins before it is a table. It begins as ordinary life: a customer searches for a product, walks into a store, taps a card, opens an app, ignores an offer, leaves a review, calls support, uploads a receipt, asks a chatbot a question, or cancels a subscription. Modern businesses are covered in these traces. Some are clean rows. Some are messy sentences. Some are images, audio, locations, documents, or model outputs. All of them are partial records of something a person, machine, or organization did.

This opening chapter draws the whole map before Part I zooms in. It moves through four questions in turn — where business data comes from, how it is stored, how it is used, and how those pieces connect into a single data-to-decision loop. The order matters. A manager should meet the modern business data system first, and the methods later, as tools that serve specific decisions inside that system.

Where Data Comes From

The executive question: what business activity generated this data?

The first managerial skill is not choosing a model. It is asking what the data is a trace of. A sales row is a trace of a completed transaction, not of all customer demand. A click is a trace of attention inside one interface, not of preference in general. A support ticket is a trace of a problem serious enough to report, not of every problem customers experienced. A prompt log is a trace of an AI workflow being used, not proof that the workflow helped.

This matters because every data source carries the bias of the process that created it. If the process changes, the data changes even when the underlying business does not. If the app redesign makes the return button harder to find, return requests may fall while dissatisfaction rises. If a chatbot deflects simple questions, the support tickets that remain will look more severe. If a store starts scanning loyalty IDs more consistently, "repeat customer" metrics may jump without any real change in loyalty.

Where business data comes from

Customer behavior

Purchases, clicks, searches, visits, returns, ratings, reviews

Business use: Demand, loyalty, churn, product-market fit

Business operations

Inventory, invoices, CRM records, shipments, staffing, contracts

Business use: Margin, service quality, capacity, working capital

Digital systems

App events, web logs, ad auctions, recommendation impressions

Business use: Funnels, personalization, attribution, experimentation

Physical world

Sensors, location, cameras, store traffic, delivery scans

Business use: Utilization, loss prevention, routing, field execution

Human language

Support tickets, chats, call transcripts, emails, documents

Business use: Customer voice, compliance, knowledge retrieval, workflow routing

AI workflows

Prompts, responses, citations, tool calls, evals, human review

Business use: Automation quality, risk controls, continuous improvement

Data is usually a trace of work that already happened. The trace can be useful, but it is never the whole reality.

Figure 1. Business data is generated by many kinds of activity. The manager reads each source by asking what work created the record and what decision it can support.

Figure 1 gives the broad map. Six source families cover most data a modern manager will encounter: customer behavior, business operations, digital systems, the physical world, human language, and AI workflows. The source family matters because it tells you what kind of claim the data can support.

A day in the life of data

Consider one Bean & Basket customer on a Tuesday morning.

At 7:48 a.m., Maya searches the mobile app for "oat latte." That creates a search event. At 7:49, the app shows her a seasonal drink recommendation. That creates an impression. She taps it, adds a pastry, applies a loyalty reward, and checks out. That creates cart, transaction, payment, promotion, and loyalty records. Her order is prepared at Store 104, where the point-of-sale system updates inventory and the kitchen display records fulfillment time. Her phone location confirms she picked up the order. At 9:10, she rates the experience four stars and writes, "Great drink, long wait." That creates a rating and review text. Later, the operations team uses that review in a text dashboard, the marketing team uses the transaction in a churn model, the product team uses the search event to tune recommendations, and the regional manager sees the wait-time issue in a KPI dashboard.

One morning produced many records. None of them is "the customer" in full. Each is a trace from a specific system, with a specific purpose and a specific blind spot.

Table 1. One customer morning becomes several business records. The managerial question changes with the source.

Trace	Likely record	Question it can support	What it misses
Search for oat latte	app_search_events	What are customers trying to find?	Needs not expressed through search
Recommendation shown	recommendation_impressions	Which offers receive attention?	What would have happened without the recommendation
Completed purchase	transactions	What did customers buy, where, and when?	Customers who considered but did not buy
Long wait	fulfillment_time	Where is the operating process slow?	Subjective tolerance for waiting
Four-star review	reviews	What language do customers use to describe the experience?	Silent customers who never leave reviews
AI summary used by support	ai_workflow_logs	Is the AI workflow accurate, grounded, and useful?	Errors not caught by human review

The lesson is practical: do not call these records "customer data" as if they were interchangeable. The search event, transaction, review, wait-time record, and AI log are different slices of the customer's interaction with the firm. They become powerful only when the manager knows which slice is being used.

The source changes the claim

The same managerial topic can look different depending on where the data came from.

Take customer satisfaction. A firm might measure it through star ratings, review text, support tickets, refund requests, call transcripts, social posts, survey responses, churn, or repeat purchase. These are not redundant measures of the same thing. They capture different moments in the customer journey.

Star ratings are easy to monitor but shallow.
Review text explains reasons but overrepresents people willing to write.
Support tickets reveal operational problems but only after the customer escalates.
Refunds and returns capture costly dissatisfaction but miss silent disappointment.
Churn is a final outcome, often too late for diagnosis.
AI summaries can scale interpretation but must be evaluated against source evidence.

The managerial question is not "which data source is best?" The question is: which source is closest to the decision we need to make, and what bias does its generation process introduce?

Three generation traps

Trap 1: activity bias. Data overrepresents people who act inside the measured system. App data overrepresents app users. Reviews overrepresent people motivated to write. Loyalty data overrepresents identified customers. The unmeasured population may behave differently.

Trap 2: workflow bias. Data changes when the business process changes. A new refund policy, a redesigned app, a chatbot handoff rule, or a new sales script can change recorded behavior without changing underlying demand or satisfaction.

Trap 3: AI feedback bias. AI workflows create new records and change the behavior that future models learn from. If an AI support assistant routes some complaints away from human agents, the remaining ticket data no longer represents the full complaint mix. If a recommender shows the same products repeatedly, future purchase data reflects exposure as much as preference.

These traps are not reasons to avoid data. They are reasons to read data as a product of its generating process.

A trace only becomes usable once it lands somewhere. That makes the next question structural: where does the data live, and what was that system built to do?

How Data Is Stored

The word "database" hides too much. The system that records a customer's payment is not built for the same job as the system that scans five years of transactions for a pricing analysis. The place that stores raw app logs is not the same as the place that supports semantic search over policy documents. A manager does not need to administer these systems, but does need to understand their roles. Otherwise every data conversation becomes vague: "Can we get the data?" Which data? From which system? For what decision? At what latency? With what quality contract?

The executive question: what job is this data system doing?

Modern firms usually store data in several layers. Each layer optimizes for a different job.

An operational database records the next event correctly: a payment, an order, a login, a shipment, a service case. It is built for reliability, identity, permissions, and fast small updates. An analytical database scans many past events: a year of transactions, a panel of stores, a customer cohort, a product assortment, a marketing funnel. It is built for aggregation, history, and comparison. The distinction is not technical trivia. It determines whether the system is meant to run the business or analyze the business.

The storage stack is a division of labor

Source systems

Ingestion

Storage

Transform

Metrics, models, AI

Decision

System	Primary job	Common examples	Managerial question
Operational SQL	Run the application	Orders, accounts, payments, POS, CRM	Can the business record the next transaction correctly?
NoSQL and search	Serve flexible app data	Documents, sessions, profiles, product catalogs, keyword search	Can the app retrieve the right object quickly?
Lake and files	Keep raw and semi-raw assets	Logs, parquet files, PDFs, images, audio, vendor drops	Can the firm preserve data before every use is known?
Warehouse or lakehouse	Answer analytical questions	Snowflake, BigQuery, Databricks-style lakehouses	Can managers scan history across customers, products, and time?
Local analytics	Let one analyst work quickly	DuckDB, notebooks, local parquet, reproducible extracts	Can a small team investigate without waiting on production systems?
Vector and graph stores	Find meaning and relationships	Embeddings, semantic search, RAG indexes, product/customer graphs	Can the workflow retrieve related ideas, documents, or entities?

The practical distinction is transactional versus analytical: one system records the next event; another scans many past events to support a decision.

Figure 2. The modern storage stack is a division of labor. Each system class stores a different kind of evidence for a different kind of decision.

Figure 2 is the practical map. Operational SQL, NoSQL, lakes, warehouses, local analytical engines, vector databases, graph stores, and search indexes are not competing names for the same thing. They are specialized pieces of a workflow that moves from source activity to decision.

Transactional versus analytical

The most important distinction is transactional versus analytical.

A transactional system answers: can we record and retrieve one business event correctly right now? The point-of-sale system must know the price, charge the customer, update inventory, and create a receipt. The CRM must record a sales interaction. The app database must know which user is logged in. Mistakes here interrupt the business.

An analytical system answers: what pattern emerges across many business events? The warehouse computes weekly revenue by region, demand by product, churn by cohort, margin by promotion, and service quality by store. It is not trying to record the next transaction. It is trying to make history comparable.

Table 2. Transactional and analytical systems answer different questions. Confusing them creates slow tools, fragile reporting, and mistrusted numbers.

Dimension	Transactional system	Analytical system
Primary job	Record the next event correctly	Compare many past events
Typical questions	Did this order, payment, or login succeed?	Which customers, stores, products, or periods are changing?
Data shape	Current records, normalized entities, app state	History, panels, aggregates, derived metrics
Latency	Immediate or near-immediate	Batch, near-real-time, or streaming depending on the use case
Failure mode	The business cannot operate	The organization makes decisions from stale or inconsistent evidence

Managers feel this distinction in ordinary meetings. When the CFO asks for margin by promotion over the past six quarters, the answer should not come from the live checkout database. When customer support needs the current status of an order, the answer should not wait for the nightly warehouse refresh. Each system can be excellent and still be wrong for the job.

The major storage roles

SQL operational databases store structured app and business records: customers, orders, payments, products, subscriptions, tickets. They usually enforce relationships and consistency. If one customer has many orders, SQL is good at keeping that relationship explicit.

NoSQL systems serve flexible or high-scale application data: product catalogs, session state, user profiles, event payloads, documents, and other records whose structure changes often. They are often useful when the application needs fast reads and writes over flexible objects.

Data lakes and object storage keep raw or semi-raw assets: logs, vendor files, parquet tables, documents, images, audio, and historical extracts. The lake is useful when the firm wants to preserve data before every analytical use is known.

Warehouses and lakehouses make history analyzable. Systems such as Snowflake, BigQuery, and Databricks-style lakehouses are used to scan large historical datasets, join source systems, define metrics, and support dashboards, notebooks, and model training.

DuckDB-style local analytics gives analysts a fast, lightweight way to work with serious data on a laptop or in a reproducible script. This is useful for teaching, prototyping, case packs, and focused investigation before work becomes shared infrastructure.

Search, vector, and graph systems support retrieval and relationships. Keyword search finds exact or near-exact terms. Vector databases store embeddings so workflows can retrieve semantically related documents, products, customers, or images. Graph stores represent relationships such as referrals, product co-purchases, supply chains, account networks, and organizational structures.

The point is not to memorize product names. The point is to ask which system is doing which job.

Batch, streaming, and freshness

Data also differs by freshness.

Some workflows are fine with a nightly refresh. A weekly executive KPI dashboard, a monthly pricing review, or a quarterly market expansion analysis does not need every transaction within seconds. Other workflows need near-real-time data: fraud detection, stockout alerts, delivery routing, ad bidding, anomaly detection, or a customer-facing recommendation shown during a session.

Freshness has a cost. Real-time systems are harder to build, harder to monitor, and easier to over-trust. A manager should ask: what decision becomes better if this is refreshed sooner? If the action is weekly, minute-level freshness may only create noise.

Table 3. Data freshness should match the decision cadence. Faster is valuable only when someone can act faster.

Cadence	Example workflow	Managerial test
Daily or weekly batch	Executive KPI dashboard, store performance review	Will anyone change an action more than once per day or week?
Near-real-time	Inventory alert, fraud flag, support escalation	Does a faster signal prevent loss or improve service immediately?
Streaming or session-time	Ad bidding, next-best recommendation, live routing	Is the decision made during the customer or operational interaction?

How storage affects methods

The rest of the book repeatedly depends on storage choices.

Dashboards need stable metric tables, not ad hoc extracts.
Causal analysis needs historical data at the right grain, not only summary reports.
Prediction needs labels, features, and timestamps aligned in a feature table.
Recommenders need exposure logs as well as purchase logs, or they confuse preference with what the system happened to show.
Retrieval-augmented generation needs a document store, a search or vector index, source metadata, and a way to evaluate retrieval quality.
Governance needs lineage: where the data came from, who owns it, how fresh it is, and what changed since last time.

Storage tells us what is possible to ask. The next question is what the firm actually does with the answer.

How Data Is Used

Data creates value only when it changes a workflow. A dashboard no one reviews is a report, not a management system. A prediction score that triggers no action is a number, not intelligence. A recommendation system that cannot learn from exposure and feedback is guesswork with software around it. A language model that summarizes documents without evaluation is convenience without control. The practical question is always the same: what job is the data doing?

The executive question: what decision workflow will this evidence improve?

Modern data use falls into a small number of recurring workflow families. The names change by industry, but the managerial logic is stable.

Some workflows monitor the business: revenue, margin, churn, conversion, service time, stockouts, quality, risk. Some diagnose where a metric moved: by customer segment, geography, product, cohort, channel, store, or time period. Some learn causally: did a price change, campaign, policy, or process change cause an outcome? Some predict: which customers will churn, which orders are risky, how much demand should we expect? Some rank and recommend: what should be shown first, who should be contacted first, which action should be suggested next? Some read unstructured work: tickets, calls, documents, images, contracts, resumes, invoices, policy manuals. Some optimize a constrained action: staffing, inventory, routing, pricing, media spend, assortment, or scheduling.

Use-case router: business question to evidence workflow

What is happening?

Monitoring and KPI dashboards

Metric card, alert, scorecard

Parts I-II

Where and for whom?

Segmentation, cohorts, drilldowns

Segment profile, cohort view, diagnostic dashboard

Parts II-IV

Did our action cause it?

Experiments and causal designs

Identification memo, lift chart, effect estimate

Part III

What is likely next?

Prediction, forecasting, risk scoring

Predictive task contract, model card

Part IV

What should we show first?

Ranking and recommendation

Ranked list, threshold rule, monitoring view

Part IV

What does the text, document, or image say?

Extraction, search, RAG, AI-assisted workflows

AI workflow card, eval dashboard, review queue

Part V

The same source data can support several workflows. The manager's first job is to route the question before choosing the method.

Figure 3. The use-case router. Start with the business question, then choose the workflow, evidence asset, and book home that fits it.

Figure 3 is one of the book's central habits. Before debating tools, route the question.

Monitoring: what is happening?

Monitoring is the most familiar use of data. The firm defines metrics, refreshes them on a cadence, and watches for movement. KPI dashboards, scorecards, alerts, and operating reviews all live here.

Monitoring is useful when:

the metric is clearly defined;
the owner knows what action they can take;
the refresh cadence matches the action cadence;
the dashboard separates normal variation from signals that require attention;
the view supports drilldown when a metric moves.

Monitoring fails when the dashboard becomes a collection of charts without a decision path. A useful dashboard answers three questions in sequence: what moved, where did it move, and what should we inspect or do next?

Diagnosis: where, for whom, and why might it be happening?

Diagnosis starts after monitoring notices movement. If weekly margin falls, the manager needs to know whether the issue is a product, store, region, customer segment, campaign, channel, or supply problem. This is where segmentation, cohorts, funnels, small multiples, maps, and drilldowns matter.

Diagnosis does not prove causality. It narrows the search. It tells the team where to investigate and what comparison might be useful.

Table 4. Monitoring and diagnosis are different workflow stages. A dashboard should support both without confusing them.

Stage	Question	Default evidence	Common failure
Monitor	What changed?	KPI trend, alert, scorecard	No action owner or threshold
Diagnose	Where did it change?	Segment drilldown, cohort, small multiples	Treating a pattern as proof of cause
Decide	What should we do?	Decision brief, experiment, model, memo	Jumping from dashboard to action without comparison

Strategic decisions: where should the firm place its bets?

Strategy uses data differently from daily operations. The evidence is often less fresh, more aggregated, and more uncertain. Market expansion, pricing architecture, product portfolio, customer segment focus, channel strategy, capacity investment, and acquisition screening all require a blend of historical data, external context, assumptions, and judgment.

The practical discipline is to separate three things:

Facts from the current business. What do our customers, products, stores, channels, and margins show?
Assumptions about the future. What must be true for the strategy to work?
Tests and signals. What data would tell us early that the strategy is working or failing?

This is why the book returns to decision memos. Strategic decisions need evidence, but they also need an explicit threshold for acting and a plan for learning after action.

Causal learning: did the action work?

Many business questions are causal: did the email cause incremental sales, did the discount lift profit, did the new onboarding flow reduce churn, did the policy change reduce risk? Historical data alone often makes these questions look easier than they are.

The key issue is the counterfactual: what would have happened without the action? Experiments, A/B tests, difference-in-differences, synthetic control, regression with credible identification, and other designs are ways of constructing a comparison that earns the word "caused."

Causal learning is not always required. If the question is only "which stores are currently above target?", a dashboard is enough. If the question is "should we roll out this promotion nationally?", a dashboard is not enough.

Prediction, ranking, and recommendation

Prediction asks what is likely to happen next. Churn models, demand forecasts, fraud scores, lead scores, risk models, and delivery-time predictions all live here. The model does not need to know what caused the outcome to be useful, but it does need a clear action attached to the score.

Ranking and recommendation go one step further. They order choices: which product to show, which customer to contact, which ticket to escalate, which loan to review, which store to visit, which document to retrieve. Ranking systems need extra care because they shape the future data they observe. If a product is never shown, the system cannot learn whether customers would have liked it.

Generative AI and unstructured workflows

Many modern workflows use data that does not look like rows and columns: support tickets, product reviews, policy documents, sales calls, PDFs, images, screenshots, contracts, invoices, code, slides, and emails. AI workflows help classify, extract, summarize, retrieve, draft, route, and monitor this work.

The mature version is not "ask the model." It is a designed workflow:

collect the source material;
retrieve or select relevant context;
ask the model for a bounded task;
require structured output when the result must feed another system;
evaluate accuracy, grounding, bias, privacy, and refusal behavior;
route uncertain or high-risk cases to human review;
monitor the workflow after deployment.

AI is powerful because it makes language, documents, and images operational. It is risky for the same reason: it can make weak evidence look fluent.

Optimization: what should we do under constraints?

Optimization turns predictions, rules, and business constraints into an action plan. How many employees should be scheduled? Which stores should receive inventory? How should delivery routes be assigned? Which media channels should receive budget? Which price should be offered under margin and fairness constraints?

Optimization is often where analytics becomes real. It also exposes hidden objectives. Are we optimizing revenue, margin, customer satisfaction, utilization, fairness, retention, or risk? If the objective is wrong, the optimized answer is wrong with confidence.

Monitoring, diagnosis, causal proof, prediction, and AI workflows are not separate islands. They are stages of one loop — the loop that connects every part of this book.

The Data-to-Decision Loop

The modern data system is a loop. Human and machine activity creates records. Records move into storage. Storage feeds transformations, metrics, charts, models, search indexes, and AI workflows. Those evidence assets inform decisions. Decisions change customer experience, operations, pricing, policy, product, staffing, or automation. Those changes create new data. The loop starts again.

The executive question: where are we in the data-to-decision loop?

Most analytics mistakes come from losing track of the loop. A team starts with a model but has not named the action. A dashboard monitors a metric but has no threshold. A causal analysis estimates an effect but does not connect to a decision cadence. An AI workflow answers questions but has no evaluation or escalation path. Each failure is a broken link between data and action.

The data-to-decision loop

Step 1

Human or machine activity

A customer acts, a process runs, a model responds

Step 2

Source record

A transaction, log, ticket, document, image, or prompt trace

Step 3

Storage and transformation

Operational DB, lake, warehouse, feature table, vector index

Step 4

Evidence asset

Metric, chart, causal estimate, prediction, retrieval result

Step 5

Decision and action

A manager changes a price, offer, process, policy, or workflow

Step 6

Feedback and monitoring

The action creates new data and the loop starts again

The loop is circular, not linear. Every decision changes the business, and that changed business generates the next round of data.

Figure 4. The data-to-decision loop. Every decision changes the business, and the changed business generates the next round of evidence.

Figure 4 is the front-door operating model for the book. Part I begins in the source record. Part II turns records into visual evidence. Part III asks whether an action caused an outcome. Part IV predicts and ranks future cases. Part V brings text, documents, images, embeddings, and language models into the loop. Part VI asks how the organization runs the loop repeatedly without losing ownership, quality, or governance.

Data-driven versus data-decorated

A decision is data-driven only when three things are named:

The action. A specific lever someone can pull.
The comparison. What would happen if the firm did not act or acted differently.
The threshold. The signal, effect size, ROI, quality level, or risk standard required to act.

Anything missing one of these may still be useful description. It may be a good dashboard, a good analysis, or a promising model. But it is not yet a decision.

Table 5. Data-decorated work often has impressive evidence but a missing decision link.

Failure pattern	What it looks like	Missing link
Dashboard without action	Weekly KPI review shows a metric falling	No owner, threshold, drill path, or response playbook
Causal claim without counterfactual	Customers who received an email spent more	What those same customers would have spent without the email
Prediction without workflow	A churn score ranks customers every Monday	Which offer, threshold, queue, and follow-up action the score triggers
AI without evaluation	A chatbot answers from company documents	Grounding tests, refusal rules, escalation, monitoring, and ownership

The decision ladder

The book's evidence languages climb a ladder of business questions. The lower rungs do not disappear when the upper rungs arrive. A prediction model still depends on a clean target. A causal design still depends on a well-defined unit, timing, and outcome. A language model workflow still needs source data, storage, evaluation, monitoring, and human review.

The decision ladder

Figure 5. The decision ladder. Each rung asks a different business question and requires a different evidence language.

The ladder is a routing device:

What happened? Read the data and define the metric.
Where and for whom? Visualize, segment, compare, and diagnose.
What caused it? Construct a credible counterfactual.
How much does the lever matter? Estimate effects, elasticity, and heterogeneity.
What is likely next? Predict, rank, and evaluate.
What does the text, image, or document say? Use unstructured data and AI workflows.
How do we operate this? Monitor, govern, communicate, and learn.

Six evidence languages

Six evidence languages, one per Part

Decision question	Evidence language	Part	Studio
What happened?	Description, metrics	I	Data Language Studio (§4.1)
What should the eye see first?	Visual evidence	II	Visual Decision Brief (§8.2)
What caused it?	Causal designs	III	Pricing & Promotion (§13.4)
What is likely next?	Prediction & segmentation	IV	Customer Intelligence (§17.4)
What does the text or image say?	AI workflows	V	Customer Voice Intelligence (§22.2)
How do we operate this?	System view	VI	Final Integrative Case (§25.1)

Each Part teaches one evidence language and ends with a Studio that ships its capstone artefact.

Figure 6. The evidence languages. Each Part of the book adds one way of turning data into decision-relevant evidence.

The important point is not the numbering. It is the discipline of choosing the evidence language that fits the decision. A dashboard should not be asked to prove causality. A predictive model should not be treated as an intervention. A language model should not be trusted because it is fluent. A causal estimate should not ship if no one knows what action it changes.

The artifacts that survive the work

The book does not end each method with "and now you know the method." It ends with an artifact that a firm can reuse, audit, and refresh.

The artefact family — five one-page documents that survive the work

Decision Question Card

What action, on what unit, with what counterfactual?

§9.1

↓

Predictive Task Contract

What target, for what unit, on what horizon, with what features?

§14.2

↓

Model Card

What does this model do, where does it fail, who owns it?

§15.5

↓

AI Workflow Card

What does this workflow do, what governs it, who responds?

§22.1

↓

Decision Memo

What is the recommendation, what evidence supports it, what next?

§24.1

Each artefact extends the discipline of the one above. The card you write at §9.1 grows into the memo you sign at §24.1.

Figure 7. The artifact family. These one-page artifacts turn analysis into reusable decision infrastructure.

The artifact family matters because modern analytics is not a sequence of one-off clever analyses. It is infrastructure for repeated decisions. A metric card can be reused in a dashboard. A predictive task contract can be reused by a modeling team. A model card can be used by risk, legal, product, and operations. An AI workflow card can be audited when the workflow changes. A decision memo can show what evidence led to action and how the firm will learn afterward.

The cases

One through-line company, Bean & Basket Coffee, appears throughout the book. Standalone cases add real empirical grounding where a specific method needs a richer dataset.

The case portfolio

Bean & Basket CoffeeThe continuous through-line

A multi-store specialty coffee chain with reviews, tickets, transactions, panel data, campaigns, products, stores, and an internal knowledge base. Appears in every Part.

Standalone case studies

Progresso Soup

Pt II, Pt III

Visual evidence, fixed effects, elasticity

Milk Field Data

Pt III

Quasi-experiment, heterogeneous effects

Zillow Colorado

Pt III

Difference-in-differences, synthetic control

BAV Fast Food

Pt IV

PCA, perceptual maps

Airbnb (illustrative)

Pt IV

Numeric prediction, residuals

Yelp Reviews

Pt V

Sentiment, topics, GPT measurement

Goose Island Twitter

Pt V

Emotion vs. sentiment

Earnings Calls

Pt V

Evasiveness measurement

Job Postings

Pt V

Construct measurement

Standalone cases are appended outside chapter prose. They give the methods a second testing ground beyond the Bean & Basket through-line.

Figure 8. The case portfolio. Bean & Basket provides continuity; standalone cases give specific methods a second testing ground.

The purpose of the cases is not to decorate chapters. It is to make the loop concrete. Every case asks: what activity generated the data, where is it stored, what evidence language fits the decision, what artifact should survive, and how would the organization monitor what happens next?

Where Part I begins

Part I now begins after the full system is visible. We zoom in from the operating loop to the basic object inside it: a dataset. The first question becomes almost physical: what does one row mean? That question sounds small, but it controls everything that follows. Grain, structure, variable type, joins, reshaping, metrics, and data quality are the mechanics that make the larger loop trustworthy.