§14.1

Case Study: Trump Tweet Source Classification

This case starts with a famous Washington Post observation: when the Trump account wished teams good luck, the tweet often came from an iPhone; when it attacked rivals, it often came from Android. That is a perfect teaching case for classical NLP because the business object is small, the label is concrete, and the interpretation is tempting enough to be dangerous.

The file contains 5,653 tweets from January 1, 2016 through October 17, 2017. Each row has a timestamp, tweet text, and a source label: Android or iPhone. The context document frames the exercise as feature extraction plus classification: one source is treated as Trump, the other as a surrogate using the same handle.

The point of the case is not to turn NLP into political gossip. The point is to show how language, metadata, and a transparent baseline model can build an authorship-style fingerprint while still keeping the caveats visible.


The Executive Question

Can tweet text and simple metadata distinguish Android-labelled tweets from iPhone-labelled tweets, and what does that distinction actually mean?

The careful version: the model can classify source labels. It cannot prove who physically typed a given tweet, and it should not ignore that campaign operations changed over time.


Start With the Source Regime

The full corpus continues into 2017, but the main classifier uses the campaign window: January 1, 2016 through November 8, 2016. That restriction matters because the device mix changes after the election and eventually becomes an all-iPhone stream. A classifier that quietly learns a period regime is less useful as an authorship lesson.

Device labels define two different communication streams

The full file continues into 2017; the classifier is trained on the 2016 campaign window to avoid the later all-iPhone source regime.
Full corpus
5,653
January 1, 2016 to October 17, 2017
Campaign window
3,572
The classifier uses this pre-election slice.
Android share
47%
In the January-November 2016 campaign window.
Figure 1. The tweet corpus by source label. The campaign-window model avoids the later all-iPhone regime so the exercise stays focused on language and posting cues rather than post-election operations.

The first lesson is metadata-first. The timestamp and device label are not afterthoughts; they determine what the text task is allowed to claim.

Table 1. Tweet-source classification task contract. The task is source-label prediction, not direct authorship attribution.
DecisionCase choiceAnalytical implication
DocumentOne tweet from the @realdonaldtrump handleThe model classifies source at the tweet level, not at the day, topic, or account level.
LabelAndroid versus iPhone source labelThe label is a device/source proxy. It is not direct observation of who typed every word.
FeaturesTokens, bigrams, URLs, hashtags, mentions, punctuation, and timing cuesThe analysis shows how ordinary text features combine with metadata to create a fingerprint.
Evaluation75/25 stratified held-out split, seed 11; 894 held-out tweetsA held-out score checks whether the fingerprint generalizes beyond the examples we read.
InterpretationClassification evidence, not authorship proofA source classifier can support an audit story, but it cannot settle intent or identity by itself.

What Features Carry the Fingerprint?

The strongest differences are not exotic. Android-labelled tweets use more combative cue words and more direct mentions. iPhone-labelled tweets carry more campaign-broadcast mechanics: URLs, hashtags, event language, thanks, and rally logistics. That is exactly what a manager should expect if two communication workflows share one public account.

The fingerprint is strongest when text and posting routine are read together

Words and punctuation that make the Android-labelled tweets read more personally combative.
Attack-word hits per tweet
Android
0.73
iPhone
0.30
Exclamation marks per tweet
Android
0.78
iPhone
0.75
Campaign/thanks hits per tweet
Android
0.21
iPhone
0.56
Android-label examples
2016-07-06 04:36Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT!
2016-08-14 16:50Crooked Hillary Clinton is being protected by the media. She is not a talented person or politician. The dishonest media refuses to expose!
2016-05-26 13:18The Inspector General's report on Crooked Hillary Clinton is a disaster. Such bad judgement and temperament cannot be allowed in the W.H.
iPhone-label examples
2016-01-28 17:32It is my great honor to support our Veterans with you! You can join me now. Thank you! #Trump4Vetshttps://t.co/UVn3kUd2DV
2016-02-23 03:51Join me live- now in Las Vegas Nevada! We will MAKE AMERICA SAFE & GREAT AGAIN! #VoteTrumpNV #NevadaCaucus https://t.co/IW9s9noxDT
2016-07-01 17:07Thank you for your support! We will MAKE AMERICA SAFE AND GREAT AGAIN! #ImWithYou #AmericaFirst https://t.co/ravfFT5UBE
Figure 2. Source fingerprint explorer. The signal is not one magic word; it is a bundle of tone, distribution, and timing cues.

This is also where preprocessing matters. A naive bag-of-words model treats #draintheswamp, join live, and crooked as tokens. Those tokens are not neutral: some represent campaign distribution, some represent rhetoric, and some represent the political target mix. The analyst has to decide whether those are legitimate source cues or leakage from the campaign calendar.


A Transparent Baseline Is Enough to Be Useful

A simple Naive Bayes model trained on unigrams and bigrams reaches 79% held-out accuracy, versus a 53% majority-class baseline. That is strong enough to show a real source signal and modest enough to keep the interpretation honest.

A transparent baseline can recover the source label, but it is not an authorship oracle

Held-out accuracy
79%
894 held-out tweets
Majority baseline
53%
Always predict the larger class.
Vocabulary
1,573
Unigrams and bigrams after basic cleaning.
Actual
Predicted Android
Predicted iPhone
Android
349
71
iPhone
117
357

Reader-facing cue terms

Counts are shown as term hits per 100 campaign-window tweets in the selected source.

crooked
9.6/100
media
4.5/100
lyin
1.6/100
weak
1.5/100
bad judgement
1.1/100
dishonest
2.0/100
disaster
1.3/100
rubio cruz
0.6/100
poor
0.5/100
fake
0.2/100
Figure 3. Held-out tweet-source classifier. The model is useful because it beats the baseline clearly; it is limited because the off-diagonal cells remain meaningful.

The cue list is more important than the algorithm name. Android-labelled tweets are more likely to carry terms such as crooked, lyin, media, and weak; iPhone-labelled tweets are more likely to carry campaign hashtags, links, join live, thank support, and event language. That contrast is the story. The classifier merely tests whether the contrast is stable enough to predict unseen tweets.


What the Case Teaches

The responsible interpretation has three layers:

  1. A source fingerprint exists. Text and posting features distinguish Android-labelled tweets from iPhone-labelled tweets in the campaign window.
  2. The fingerprint is operational. The iPhone stream looks more like a campaign broadcast channel; the Android stream looks more like direct commentary and attack language.
  3. The label is a proxy. Device source is not a sworn authorship record. It reflects tools, staff workflows, time periods, and campaign communication routines.