Part V · Chapter 13

Text as Business Data

Turning reviews, tickets, and transcripts into evidence a model can act on — and knowing exactly where word counts stop working.

This chapter focuses on turning prose — reviews, tickets, transcripts, and social posts — into evidence a model can act on, using the classical NLP stack: tokens, document-term matrices, TF-IDF weighting, supervised classifiers for routing and sentiment, and LDA topic models surfaced as weekly text dashboards. Working the Bean & Basket coffee case, it shows where word counts earn their keep as a transparent baseline and where they quietly break. It closes with a gallery of failure modes — sarcasm, negation, polysemy, idiom, mixed and context-dependent sentiment — that motivates the move to embeddings in the next chapter. The recurring discipline: name the document, choose the representation, state the construct, then inspect what the method threw away.

Start reading

Topics covered

the document-term matrixTF-IDF weightingtokenization and n-gramsstop-word and negation handlingaspect-based sentimentticket routing and confusion matricesLDA topic modelstopic-trend dashboardspolysemy and context shift

In this chapter

Interactive studios

Global MediaGDELT Media Agenda LabSearch global news and television coverage as an agenda-setting lab: compare attention, tone, source geography, station airtime, and evidence cards from live GDELT APIs.Consumer FinanceCFPB Crisis MonitorUse public consumer complaints as a crisis early-warning system: pin incident spikes, inspect consented narratives, and separate product mix shifts from real operational improvement.