§14.1

Case Study: Trump Tweet Source Classification

This case starts with a famous Washington Post observation: when the Trump account wished teams good luck, the tweet often came from an iPhone; when it attacked rivals, it often came from Android. That is a perfect teaching case for classical NLP because the business object is small, the label is concrete, and the interpretation is tempting enough to be dangerous.

The file contains 5,653 tweets from January 1, 2016 through October 17, 2017. Each row has a timestamp, tweet text, and a source label: Android or iPhone. The context document frames the exercise as feature extraction plus classification: one source is treated as Trump, the other as a surrogate using the same handle.

The point of the case is not to turn NLP into political gossip. The point is to show how language, metadata, and a transparent baseline model can build an authorship-style fingerprint while still keeping the caveats visible.

The Executive Question

Can tweet text and simple metadata distinguish Android-labelled tweets from iPhone-labelled tweets, and what does that distinction actually mean?

The careful version: the model can classify source labels. It cannot prove who physically typed a given tweet, and it should not ignore that campaign operations changed over time.

Start With the Source Regime

The full corpus continues into 2017, but the main classifier uses the campaign window: January 1, 2016 through November 8, 2016. That restriction matters because the device mix changes after the election and eventually becomes an all-iPhone stream. A classifier that quietly learns a period regime is less useful as an authorship lesson.

Device labels define two different communication streams

The full file continues into 2017; the classifier is trained on the 2016 campaign window to avoid the later all-iPhone source regime.

Full corpus

5,653

January 1, 2016 to October 17, 2017

Campaign window

3,572

The classifier uses this pre-election slice.

Android share

47%

In the January-November 2016 campaign window.

Figure 1. The tweet corpus by source label. The campaign-window model avoids the later all-iPhone regime so the exercise stays focused on language and posting cues rather than post-election operations.

The first lesson is metadata-first. The timestamp and device label are not afterthoughts; they determine what the text task is allowed to claim.

Table 1. Tweet-source classification task contract. The task is source-label prediction, not direct authorship attribution.

Decision	Case choice	Analytical implication
Document	One tweet from the @realdonaldtrump handle	The model classifies source at the tweet level, not at the day, topic, or account level.
Label	Android versus iPhone source label	The label is a device/source proxy. It is not direct observation of who typed every word.
Features	Tokens, bigrams, URLs, hashtags, mentions, punctuation, and timing cues	The analysis shows how ordinary text features combine with metadata to create a fingerprint.
Evaluation	75/25 stratified held-out split, seed 11; 894 held-out tweets	A held-out score checks whether the fingerprint generalizes beyond the examples we read.
Interpretation	Classification evidence, not authorship proof	A source classifier can support an audit story, but it cannot settle intent or identity by itself.

What Features Carry the Fingerprint?

The strongest differences are not exotic. Android-labelled tweets use more combative cue words and more direct mentions. iPhone-labelled tweets carry more campaign-broadcast mechanics: URLs, hashtags, event language, thanks, and rally logistics. That is exactly what a manager should expect if two communication workflows share one public account.

The fingerprint is strongest when text and posting routine are read together

Words and punctuation that make the Android-labelled tweets read more personally combative.

Attack-word hits per tweet

Android

0.73

iPhone

0.30

Exclamation marks per tweet

Android

0.78

iPhone

0.75

Campaign/thanks hits per tweet

Android

0.21

iPhone

0.56

Android-label examples

2016-07-06 04:36Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT!

2016-08-14 16:50Crooked Hillary Clinton is being protected by the media. She is not a talented person or politician. The dishonest media refuses to expose!

2016-05-26 13:18The Inspector General's report on Crooked Hillary Clinton is a disaster. Such bad judgement and temperament cannot be allowed in the W.H.

iPhone-label examples

2016-01-28 17:32It is my great honor to support our Veterans with you! You can join me now. Thank you! #Trump4Vetshttps://t.co/UVn3kUd2DV

2016-02-23 03:51Join me live- now in Las Vegas Nevada! We will MAKE AMERICA SAFE & GREAT AGAIN! #VoteTrumpNV #NevadaCaucus https://t.co/IW9s9noxDT

2016-07-01 17:07Thank you for your support! We will MAKE AMERICA SAFE AND GREAT AGAIN! #ImWithYou #AmericaFirst https://t.co/ravfFT5UBE

Figure 2. Source fingerprint explorer. The signal is not one magic word; it is a bundle of tone, distribution, and timing cues.

This is also where preprocessing matters. A naive bag-of-words model treats #draintheswamp, join live, and crooked as tokens. Those tokens are not neutral: some represent campaign distribution, some represent rhetoric, and some represent the political target mix. The analyst has to decide whether those are legitimate source cues or leakage from the campaign calendar.

A Transparent Baseline Is Enough to Be Useful

A simple Naive Bayes model trained on unigrams and bigrams reaches 79% held-out accuracy, versus a 53% majority-class baseline. That is strong enough to show a real source signal and modest enough to keep the interpretation honest.

A transparent baseline can recover the source label, but it is not an authorship oracle

Held-out accuracy

79%

894 held-out tweets

Majority baseline

53%

Always predict the larger class.

Vocabulary

1,573

Unigrams and bigrams after basic cleaning.

Actual

Predicted Android

Predicted iPhone

Android

349

iPhone

117

357

Reader-facing cue terms

Counts are shown as term hits per 100 campaign-window tweets in the selected source.

crooked

9.6/100

media

4.5/100

lyin

1.6/100

weak

1.5/100

bad judgement

1.1/100

dishonest

2.0/100

disaster

1.3/100

rubio cruz

0.6/100

poor

0.5/100

fake

0.2/100

Figure 3. Held-out tweet-source classifier. The model is useful because it beats the baseline clearly; it is limited because the off-diagonal cells remain meaningful.

The cue list is more important than the algorithm name. Android-labelled tweets are more likely to carry terms such as crooked, lyin, media, and weak; iPhone-labelled tweets are more likely to carry campaign hashtags, links, join live, thank support, and event language. That contrast is the story. The classifier merely tests whether the contrast is stable enough to predict unseen tweets.

What the Case Teaches

The responsible interpretation has three layers:

A source fingerprint exists. Text and posting features distinguish Android-labelled tweets from iPhone-labelled tweets in the campaign window.
The fingerprint is operational. The iPhone stream looks more like a campaign broadcast channel; the Android stream looks more like direct commentary and attack language.
The label is a proxy. Device source is not a sworn authorship record. It reflects tools, staff workflows, time periods, and campaign communication routines.