§10.6

Case Study: RentHop Hot Listings

RentHop is a clean predictive-modeling case because the business action is concrete. A marketplace has thousands of apartment listings and limited attention to allocate. The decision is not whether a model can explain the New York rental market. The decision is which listings should be shown first, which landlords should be offered premium placement, and where the product team should watch demand clusters.

The original RentHop exercise asks students to upload a CSV, parse messy amenities, cluster latitude and longitude into neighborhood-like groups, compare logistic regression, a decision tree, and a random forest, then rank the top listings. This case study turns that prompt into a worked article inside the predictive-models section.

The important lesson is not that a random forest is magic. It is that feature engineering plus held-out ranking turns raw marketplace rows into an operating queue.


The Executive Question

Which apartment listings should RentHop move to the front of the experience, and what evidence says those listings are more likely to be Hot?

That wording matters. "Predict Hot apartments" sounds like a modelling task. "Move listings to the front" is the business task. The model earns its keep only if its scores can support that ranking decision.

Case evidence

A listing-level score for marketplace attention

The unit is the apartment listing. The target is whether the listing was marked Hot. The action is a ranked queue for featuring, premium placement, or landlord coaching.

Listings

48.7K

9 original columns in the CSV

Base rate

30.9%

15K labelled Hot listings

Feature work

18 + 25

location segments plus parsed amenity flags

Held-out lift

2.4x

72.8% Hot rate in the top score decile

Figure 1. RentHop case evidence. The source CSV has 48,695 listings, 15,041 of them labelled Hot. The top held-out score decile is 2.4x as Hot as the average listing.

The Task Contract Comes First

Before touching algorithms, write the predictive task contract. In this case the target is already labelled, but the unit, features, and action still need to be explicit.

Table 1. The RentHop predictive task contract. The row is a listing, the target is a Hot label, and the score is useful because RentHop can rank listings by it.
DecisionRentHop choiceWhy it matters
UnitOne apartment listingThe action is listing-level: feature, rank, price-coach, or hold back.
TargetHot Apartments = HotThe label is a proxy for renter demand, not a causal measure of what made demand happen.
FeaturesRent, bedrooms, bathrooms, latitude/longitude clusters, parsed amenitiesEverything used here is available at listing time in the CSV.
Evaluation70/30 stratified random split, seed 42The held-out slice grades the ranking before any top-listing queue is trusted.
ActionSort listings by predicted Hot probabilityRentHop needs a priority queue more than a yes/no verdict.

Two feature-engineering moves carry the case:

  1. Coordinates become segments. Raw latitude and longitude are too granular for a manager to reason about and too continuous for a simple categorical story. K-means turns them into 18 neighborhood-like segments.
  2. Amenities become indicators. The features column is text. Parsing the common amenities into yes/no flags lets the model learn that "no fee," "hardwood floors," and laundry-related signals carry demand information.

These are not decorations before the model. They are the model's business vocabulary.


Location Does Most of the Storytelling

The segment map shows why location clustering is more than a technical preprocessing step. It creates a market map a product team can reason about: high-rate value zones, central expensive zones, and small segments that may deserve manual review before becoming rules.

Location segments expose the market structure

Points are a stratified sample of listings; larger labeled markers are K-means segment centers colored by observed Hot rate.

S1770% HotS857% HotS351% HotS150% HotS449% HotS1641% HotS639% HotS536% HotLongitude mapped left-right; latitude mapped bottom-top. Segment labels are approximate, derived from cluster centroids.
Hot listing Not Hot listing Segment center

Highest-rate segments are value-heavy

Small segments can be real leads but need monitoring before becoming rules.

Segment 17: Far Rockaway / airport edge

23 listings, median $1,640

69.6%

Segment 8: Southwest Brooklyn

455 listings, median $1,900

57.1%

Segment 3: Upper Manhattan / Bronx

697 listings, median $1,725

50.9%

Segment 1: Astoria / northwest Queens

721 listings, median $2,150

49.5%

Segment 4: Central Queens

670 listings, median $1,900

49.3%

Segment 16: Prospect-Lefferts / Crown Heights

972 listings, median $2,400

40.5%

Segment 6: Central Brooklyn

837 listings, median $2,400

39.3%

Segment 5: Upper Manhattan

2,213 listings, median $2,175

35.8%
Figure 2. Listing map and location segments. Segment labels are approximate names from centroids, not official neighborhood boundaries; the point is to convert raw coordinates into interpretable market structure.

The highest-rate segments are not simply the priciest parts of Manhattan. The top queue leans toward lower-rent Brooklyn, Queens, and upper-Manhattan/Bronx-adjacent segments where a listing can look like strong value. That does not mean those areas are "better" markets. It means the Hot label in this data rewards a price-location-amenity balance.


Amenities and Price Turn Messy Rows Into Signals

Amenities are a good feature-engineering lesson because the raw field looks like prose but behaves like a feature catalog once parsed. The strongest amenity association is no fee: listings with that flag are materially more likely to be labelled Hot than the average listing.

Amenities become model-ready signals

The bars show percentage-point difference from the overall Hot rate, not a causal effect of adding the amenity.

-10-50+5+10No Fee+9.2 ppHardwood Floors+7.0 ppLaundry In Building+5.7 ppPrewar-18.7 ppOutdoor Space+9.0 ppDishwasher+4.4 ppDining Room+7.8 ppDogs Allowed-3.6 ppHigh Speed Internet+7.8 ppCats Allowed-3.1 ppDoorman-3.3 ppLaundry In Unit+3.9 ppDifference from 30.9% overall Hot rate

The hottest queue is not the luxury tail

Demand classification favors value in the observed labels; expensive listings are numerous, but not the strongest Hot segment.

0%25%50%75%overall58%<$2k4.8K39%$2k-2.5k7.5K32%$2.5k-3k10K28%$3k-3.5k8K23%$3.5k-4.5k8.9K20%$4.5k+9.5K
Figure 3. Amenity and price signals. Amenity bars show percentage-point lift over the overall Hot rate; price bands show observed Hot rates by monthly rent band.

The price pattern is equally important. Hot does not mean expensive. The value bands below the luxury tail carry stronger Hot rates, especially when paired with favorable locations. That is exactly why a listing score should be multivariate: price alone misses the segment context, and segment alone misses the rent/value position.


Model Comparison: A Narrow Win for the Forest

The random forest is the best held-out model in this run, with AUC 0.793. But the logistic regression baseline is close at AUC 0.788. That is a useful result. It says the engineered features are doing much of the work, and the more flexible model adds incremental lift rather than rescuing a weak setup.

The forest wins, but the baseline is close

Feature engineering carries much of the lift; algorithm choice adds a narrower gain.

0.50.60.70.8Logistic regressionAUC 0.788AP 0.602Decision treeAUC 0.760AP 0.557Random forestAUC 0.793AP 0.610Held-out ranking metrics
Logistic regressionDecision treeRandom forest

ROC shows ranking quality, not the business threshold

The operating question is still which slice of listings RentHop should feature.

0.000.000.250.250.500.500.750.751.001.00False positive rateTrue positive rate

The score creates an operating queue

On the held-out set, the top decile is more than twice as Hot as the average listing.

0%25%50%75%base73%Top 1057%20s49%30s39%40s29%50s24%60s17%70s12%80s6%90s2%100scumulative captureScore deciles, highest probability at left
Figure 4. Held-out model comparison. AUC and average precision grade ranking quality; the score-decile chart translates ranking into the operating language of which listings to feature first.

For a marketplace, the score-decile chart is usually easier to act on than the ROC curve. The top decile of random-forest scores has a Hot rate of 72.8%, compared with 30.9% overall. That means the model is useful as a ranking system even though no threshold is morally special.


What the Model Leaned On

The winning model leans first on price and value features, then location, then amenities and unit mix. That ranking is inspection, not causation. It says what sorted listings in the historical labels. It does not prove that lowering rent, adding an amenity, or moving a unit to another segment would create a Hot listing.

What the winning model leaned on

Importance is an inspection tool: it tells us what sorted listings, not what would happen if RentHop changed a feature.

Monthly rent (log)27.6%
Price per room21.4%
No Fee7.7%
Longitude5.8%
Hardwood Floors5.3%
Latitude5.2%
Bedrooms5.1%
Laundry In Building3.8%
Bathrooms2.9%
Dishwasher2.4%
Doorman1.8%
Prewar1.3%
Dogs Allowed1.3%
Cats Allowed1.0%
Price48.9%
Amenities28.9%
Location14.2%
Unit mix7.9%
Figure 5. Random-forest feature importance. Importance ranks what the model used to sort listings; it should feed model inspection and product hypotheses, not causal claims.

The managerial use is disciplined: if price/value and location dominate, RentHop should make sure those features are stable, refreshed, and monitored. If a single amenity unexpectedly dominated, that would be a reason to audit the parsing logic and label definition before shipping.


The Deployment Artifact Is the Queue

The model becomes useful when it produces a queue. In the held-out test set, the top 50 predicted listings have an actual Hot rate of 80% and a median rent of $1,500. This is the concrete product output: "here are the listings the platform should consider featuring first."

Top 50 held-out prospects

This is the deployable artifact: a ranked list with probabilities, not a model score in isolation.

80%

actually Hot

$1,500

median rent

0.821

mean score

The top 50 are not luxury trophy listings. They are mostly lower-rent, one- and two-bedroom listings in high-rate value segments.

The queue concentrates in a few segments

Segment concentration is useful for operations and risky for over-generalization.

Segment 8: Southwest Brooklyn21 listings
Segment 3: Upper Manhattan / Bronx18 listings
Segment 4: Central Queens5 listings
Segment 6: Central Brooklyn3 listings
Segment 1: Astoria / northwest Queens2 listings
Segment 16: Prospect-Lefferts / Crown Heights1 listings
RankListingSegmentRentBedsBathsScoreActual
1122 Gatling PlSegment 8Southwest Brooklyn$1,500210.833Hot
2463 78th StreetSegment 8Southwest Brooklyn$1,650210.832Hot
3835 Bay Ridge AvenueSegment 8Southwest Brooklyn$1,750210.830Hot
44712 4th AvenueSegment 8Southwest Brooklyn$1,700210.830Hot
5358 47th StSegment 8Southwest Brooklyn$1,800210.829Not
6409 Westervelt AvenueSegment 8Southwest Brooklyn$1,750210.828Hot
76718 14th AvenueSegment 8Southwest Brooklyn$1,550210.828Hot
86718 14th Ave #3-R Dyker Heights, Brooklyn, NY 11219Segment 8Southwest Brooklyn$1,600210.826Hot
975-32 67th Rd,Segment 4Central Queens$1,400110.825Hot
10521 82nd streetSegment 8Southwest Brooklyn$1,425110.825Hot
11644 73rd StreetSegment 8Southwest Brooklyn$1,435110.825Not
1268-12 Clyde StSegment 4Central Queens$1,200110.824Hot
Figure 6. Held-out top-50 action queue. The queue shows how probabilities become a marketplace operation: rank listings, inspect segment concentration, and review the highest-probability candidates.

This queue should not be fully automated on day one. A reasonable deployment path is:

  1. Use the score to create a daily candidate list for editorial or marketplace operations review.
  2. Track whether featured model-ranked listings get faster renter engagement than comparable non-featured listings.
  3. Add a threshold-profit or capacity curve once RentHop knows the value of a true positive placement and the cost of a false positive.

What This Case Teaches

The RentHop case connects five ideas from Part IV:

  1. Task design. The model predicts a listing label so RentHop can rank listings.
  2. Feature engineering. Coordinates and text become business-readable features.
  3. Generalization. A 70/30 held-out split grades the ranking before the queue is trusted.
  4. Model comparison. The random forest wins narrowly, which keeps the baseline honest.
  5. Deployment framing. The score must turn into a queue, a threshold, and a monitoring loop.