Part V · Chapter 14
Applied Text, Embeddings, and Measured Constructs
From counting words, to placing meaning in coordinates, to measuring the constructs a manager actually cares about.
This chapter moves from counting words to measuring meaning. Two real corpora open it: @realdonaldtrump tweets, where a transparent Naive Bayes model fingerprints Android-versus-iPhone source from tone, hashtags, mentions, and timing; and Goose Island acquisition chatter, where a lexicon shows that an event spike is mostly news links and anti-corporate vocabulary rather than collapsing sentiment. From there embeddings turn documents into vectors in a learned coordinate system, powering semantic search, clustering, brand maps, and drift detection. The payoff is GPT-as-measurement, where a language model scores named constructs a manager actually cares about — intent to return, evasiveness, a sense of betrayal — directly rather than through a sentiment proxy.
Topics covered
In this chapter
- 14.1Case Study: Trump Tweet Source ClassificationA transparent Naive Bayes classifier fingerprints Android-versus-iPhone tweet source from tone, links, and timing, while keeping authorship caveats visible.
- 14.2Case Study: Goose Island Acquisition SentimentA lexicon-based read of the Anheuser-Busch acquisition shows the negative spike was mostly news links and anti-corporate vocabulary, not collapsing product sentiment.
- 14.3Embeddings and Semantic SearchEmbeddings place documents in a learned meaning space, powering semantic search, clustering, brand-map triangulation, anomaly detection, and drift monitoring.
- 14.4GPT-as-Measurement: From Surface Features to ConstructsA language model measures named constructs like intent to return or executive evasiveness directly, replacing surface proxies at a fraction of annotation cost.