Part V

Unstructured Data, Embeddings, and Generative AI

Turning prose and pixels into governed evidence

This part takes on the data that never made it into a warehouse — reviews, tickets, transcripts, invoices, and images — and turns it into evidence a manager can act on. The arc climbs in four steps: Chapter 13 earns trust with transparent word-counting NLP and then catalogs where it breaks; Chapter 14 moves from words to meaning, using embeddings for structure and a language model to score the constructs a manager cares about; Chapter 15 makes embeddings concrete as retrieval, vision, and document extraction; and Chapter 16 reframes the model as a programmable, schema-bound, human-gated component inside a real governance layer. The throughline is a refusal to treat any of it as magic: name the document, choose the representation, state the construct, and inspect what the method discarded.

4 chapters · 19 articles

What you’ll learn

Build a transparent text pipeline (tokens, TF-IDF, classifiers, topic models) and recognize the failure modes — sarcasm, negation, polysemy — that force a move to embeddings
Use embeddings for semantic search, clustering, brand maps, and drift detection, and direct a language model to score named constructs like intent-to-return or evasiveness instead of a sentiment proxy
Stand up retrieval-augmented generation, vision, and document-AI workflows where the boundary work — chunking, confidence thresholds, citations, bias audits — decides whether anything ships
Convert a language model into a governed workflow component with schema-enforced JSON output, tool use, a human-approval gate, and an evaluation-and-risk rubric
Integrate classification, construct measurement, embedding clusters, and RAG into one monitored customer-voice loop a sponsor can approve

Chapters in this part

Chapter 13Text as Business Data

Turning reviews, tickets, and transcripts into evidence a model can act on — and knowing exactly where word counts stop working.

StudiosGDELT Media Agenda Lab CFPB Crisis Monitor

Chapter 14Applied Text, Embeddings, and Measured Constructs

From counting words, to placing meaning in coordinates, to measuring the constructs a manager actually cares about.

Chapter 15Retrieval, Vision, and Multimodal Workflows

One shared embedding space, four production patterns: ground the text, see the image, read the document, reason across all of them.

Chapter 16LLMs, Workflows, and Governance

An LLM is a language interface for workflows — value lives in the wiring, the gates, and the governance, not the model.

Interactive studios

Hands-on studios paired with this part’s chapters — each opens in a new tab.

Global MediaGDELT Media Agenda LabSearch global news and television coverage as an agenda-setting lab: compare attention, tone, source geography, station airtime, and evidence cards from live GDELT APIs.Consumer FinanceCFPB Crisis MonitorUse public consumer complaints as a crisis early-warning system: pin incident spikes, inspect consented narratives, and separate product mix shifts from real operational improvement.