Part V · Chapter 13

Text as Business Data

Turning reviews, tickets, and transcripts into evidence a model can act on — and knowing exactly where word counts stop working.

This chapter focuses on turning prose — reviews, tickets, transcripts, and social posts — into evidence a model can act on, using the classical NLP stack: tokens, document-term matrices, TF-IDF weighting, supervised classifiers for routing and sentiment, and LDA topic models surfaced as weekly text dashboards. Working the Bean & Basket coffee case, it shows where word counts earn their keep as a transparent baseline and where they quietly break. It closes with a gallery of failure modes — sarcasm, negation, polysemy, idiom, mixed and context-dependent sentiment — that motivates the move to embeddings in the next chapter. The recurring discipline: name the document, choose the representation, state the construct, then inspect what the method threw away.

Topics covered

the document-term matrixTF-IDF weightingtokenization and n-gramsstop-word and negation handlingaspect-based sentimentticket routing and confusion matricesLDA topic modelstopic-trend dashboardspolysemy and context shift

In this chapter

  1. 13.1From Structured to Unstructured DataReframes unstructured text not as unusable but as data needing a representation layer, mapping six families of business text to their questions.
  2. 13.2Text as DataInstalls the core vocabulary — document, corpus, token, vocabulary, n-gram, metadata — and shows how the document boundary reshapes the whole pipeline.
  3. 13.3Preprocessing, Bag-of-Words, and TF-IDFWalks through honest preprocessing choices, the bag-of-words matrix, and TF-IDF weighting that lifts distinctive words above common ones.
  4. 13.4Text Classification and SentimentCovers supervised routing and sentiment, then aspect-based sentiment heatmaps that reveal which part of the experience is under stress, where.
  5. 13.5Topic Models and Text DashboardsExplains LDA topic discovery, why humans name the topics, and the trend-over-time dashboard that drives an operating cadence.
  6. 13.6Limits of Classical NLPCatalogues where bag-of-words fails — sarcasm, negation, polysemy, idiom, mixed and context-shifted sentiment — bridging to embeddings.

Interactive studios