§0.0

Foreword: How to Read This Book

A modern business does not lack data. It produces data almost every time work happens: a customer searches, buys, returns, complains, subscribes, cancels, scans a badge, opens an app, talks to support, receives a recommendation, or asks an AI assistant to summarize a document. The hard part is not noticing that the data exists. The hard part is knowing what kind of business trace it is, where it lives, what decision it can support, and what would make that decision defensible.

This book grew out of an observation that has bothered me for years. Managers are taught more analytics every decade. Decisions do not always get better. The methods get fancier; dashboards multiply; AI announcements pile up. Yet firms still run campaigns that only look like they worked, ship prediction models with no action attached, mistake polished dashboards for evidence, and automate workflows without knowing how failure will be detected.

The book's wager is that the missing skill is not another method. It is a practical operating discipline that connects the business question to the data trace, the data trace to the evidence, and the evidence to a decision someone can own.


What this book is

This is a textbook for MBA-level managers and analysts who want to be fluent in modern data work: not necessarily to build every model from scratch, but to commission, evaluate, deploy, and govern the work intelligently.

The reader should leave able to:

  • See where business data is generated in ordinary human and organizational activity.
  • Distinguish the systems that store data for transactions, analytics, search, AI, and governance.
  • Route a business question to the right evidence language: dashboard, diagnostic view, causal design, prediction model, recommendation system, or AI workflow.
  • Read the standard visuals and artifacts that make each method interpretable.
  • Recognize the standard failures of each workflow before they become executive mistakes.
  • Turn analysis into durable artifacts: metric cards, task contracts, model cards, AI workflow cards, dashboards, and decision memos.

It is not a textbook for an aspiring data scientist who needs to derive every algorithm. The mathematics appears only when it clarifies business meaning. Code and empirical exercises live in the case packs; the chapter prose stays focused on managerial intuition and decision use.


The new opening

The first few chapters now begin with the broad picture:

  1. Where Data Comes From. Data as a trace of human behavior, business operations, digital systems, physical environments, language, documents, and AI workflows.
  2. How Data Is Stored. A practical map of operational SQL, NoSQL, data lakes, warehouses, DuckDB-style local analytics, vector databases, and graph/search systems.
  3. How Data Is Used. Monitoring, diagnosis, strategic decisions, experiments, prediction, recommendations, optimization, and generative AI workflows.
  4. The Data-to-Decision Loop. The operating model that connects source activity to storage, transformation, evidence, action, feedback, and governance.

Only after that map is visible does Part I zoom into rows, columns, grain, joins, transformations, and metric definitions. The order matters. Students should not experience the book as "statistics, then machine learning, then AI." They should experience it as the modern business data system first, and the methods as tools that serve specific decisions inside that system.


How to read it

Three honest options:

Cover to cover. The arc is deliberate. Part 0 gives the system map. Part I teaches data language. Part II teaches visual evidence. Part III teaches causal decisions. Part IV teaches prediction and segmentation. Part V teaches unstructured data, embeddings, and AI workflows. Part VI turns the pieces into an operating system.

Question-first. If you arrive with a business question, route it first. What happened? starts in Parts I-II. Did we cause it? starts in Part III. What will happen next? starts in Part IV. What are customers saying? starts in Part V. How do we run this repeatedly? starts in Part VI.

Method reference. Each article can still be read on its own. Use the table of contents as an index when you need to look up a method, but keep asking what decision the method is supposed to improve.


A note on cases

The book leans on one continuous fictional company, Bean & Basket Coffee, and a set of standalone data cases.

Bean & Basket is useful because the same firm can produce the data a modern manager actually sees: transactions, customers, stores, products, campaigns, experiments, app events, reviews, support tickets, policy documents, images, AI prompts, and model outputs. The continuity lets methods accumulate. A customer who is a row in a churn model in Part IV can also be a paragraph in a complaint cluster in Part V and part of a monitoring loop in Part VI.

The standalone cases give the methods a second testing ground when a real dataset teaches the lesson better than the house brand. Part II steps onto a market-concentration and ad-spend case and the Progresso soup-scanner panel, which then carries the pricing and causal chapters. Part III's causal work leans on Progresso alongside the Milk and Zillow Colorado quasi-experiments. Part IV brings in the RentHop listings and New York Lottery ZIP psychographics cases. Part V works real text corpora — Trump-tweet authorship, Goose Island acquisition sentiment, and Amazon review topics — and the BAV brand-survey data, which powers the Fast-Food Perceptual Map studio.

Be deliberate about where the book steps off the through-line. The causal-inference chapters in Part III leave Bean & Basket almost entirely, because clean identification needs field data a single coffee chain cannot supply; the text says so when it happens. Everywhere else, the continuity is the point.


A note on the AI chapters

The AI chapters are written around patterns that should outlast any one model release: embeddings, retrieval, structured outputs, evals, human review, tool use, monitoring, and governance. The details of model selection will change. The managerial discipline should not.

The right question is not "which model is best?" in the abstract. The right question is: what workflow is the model part of, what evidence does it produce, how will we evaluate it, and who acts when it fails?


What you will have at the end

By the time you finish Part VI, you should hold five things you did not have at the start:

  1. A map of the modern business data system, from source activity to storage to evidence to action.
  2. A vocabulary for naming any business question's evidence language.
  3. A toolkit of standard methods and the visuals that interpret them.
  4. A portfolio of artifacts that turn loose analysis into something the firm can audit, reproduce, refresh, and govern.
  5. A stance: methods serve decisions, evidence comes before equations, and monitoring closes the loop.

The first two are common to data literacy. The last three are what this book is actually for.


The next four short articles build the new front door: where data comes from, how it is stored, how it is used, and how the loop connects to the rest of the book.

  • Vishal Singh
    Stern School of Business, NYU