§10.6
Case Study: RentHop Hot Listings
RentHop is a clean predictive-modeling case because the business action is concrete. A marketplace has thousands of apartment listings and limited attention to allocate. The decision is not whether a model can explain the New York rental market. The decision is which listings should be shown first, which landlords should be offered premium placement, and where the product team should watch demand clusters.
The original RentHop exercise asks students to upload a CSV, parse messy amenities, cluster latitude and longitude into neighborhood-like groups, compare logistic regression, a decision tree, and a random forest, then rank the top listings. This case study turns that prompt into a worked article inside the predictive-models section.
The important lesson is not that a random forest is magic. It is that feature engineering plus held-out ranking turns raw marketplace rows into an operating queue.
The Executive Question
Which apartment listings should RentHop move to the front of the experience, and what evidence says those listings are more likely to be Hot?
That wording matters. "Predict Hot apartments" sounds like a modelling task. "Move listings to the front" is the business task. The model earns its keep only if its scores can support that ranking decision.
Case evidence
A listing-level score for marketplace attention
The unit is the apartment listing. The target is whether the listing was marked Hot. The action is a ranked queue for featuring, premium placement, or landlord coaching.
Listings
48.7K
9 original columns in the CSV
Base rate
30.9%
15K labelled Hot listings
Feature work
18 + 25
location segments plus parsed amenity flags
Held-out lift
2.4x
72.8% Hot rate in the top score decile
The Task Contract Comes First
Before touching algorithms, write the predictive task contract. In this case the target is already labelled, but the unit, features, and action still need to be explicit.
| Decision | RentHop choice | Why it matters |
|---|---|---|
| Unit | One apartment listing | The action is listing-level: feature, rank, price-coach, or hold back. |
| Target | Hot Apartments = Hot | The label is a proxy for renter demand, not a causal measure of what made demand happen. |
| Features | Rent, bedrooms, bathrooms, latitude/longitude clusters, parsed amenities | Everything used here is available at listing time in the CSV. |
| Evaluation | 70/30 stratified random split, seed 42 | The held-out slice grades the ranking before any top-listing queue is trusted. |
| Action | Sort listings by predicted Hot probability | RentHop needs a priority queue more than a yes/no verdict. |
Two feature-engineering moves carry the case:
- Coordinates become segments. Raw latitude and longitude are too granular for a manager to reason about and too continuous for a simple categorical story. K-means turns them into 18 neighborhood-like segments.
- Amenities become indicators. The
featurescolumn is text. Parsing the common amenities into yes/no flags lets the model learn that "no fee," "hardwood floors," and laundry-related signals carry demand information.
These are not decorations before the model. They are the model's business vocabulary.
Location Does Most of the Storytelling
The segment map shows why location clustering is more than a technical preprocessing step. It creates a market map a product team can reason about: high-rate value zones, central expensive zones, and small segments that may deserve manual review before becoming rules.
Location segments expose the market structure
Points are a stratified sample of listings; larger labeled markers are K-means segment centers colored by observed Hot rate.
Highest-rate segments are value-heavy
Small segments can be real leads but need monitoring before becoming rules.
Segment 17: Far Rockaway / airport edge
23 listings, median $1,640
Segment 8: Southwest Brooklyn
455 listings, median $1,900
Segment 3: Upper Manhattan / Bronx
697 listings, median $1,725
Segment 1: Astoria / northwest Queens
721 listings, median $2,150
Segment 4: Central Queens
670 listings, median $1,900
Segment 16: Prospect-Lefferts / Crown Heights
972 listings, median $2,400
Segment 6: Central Brooklyn
837 listings, median $2,400
Segment 5: Upper Manhattan
2,213 listings, median $2,175
The highest-rate segments are not simply the priciest parts of Manhattan. The top queue leans toward lower-rent Brooklyn, Queens, and upper-Manhattan/Bronx-adjacent segments where a listing can look like strong value. That does not mean those areas are "better" markets. It means the Hot label in this data rewards a price-location-amenity balance.
Amenities and Price Turn Messy Rows Into Signals
Amenities are a good feature-engineering lesson because the raw field looks like prose but behaves like a feature catalog once parsed. The strongest amenity association is no fee: listings with that flag are materially more likely to be labelled Hot than the average listing.
Amenities become model-ready signals
The bars show percentage-point difference from the overall Hot rate, not a causal effect of adding the amenity.
The hottest queue is not the luxury tail
Demand classification favors value in the observed labels; expensive listings are numerous, but not the strongest Hot segment.
The price pattern is equally important. Hot does not mean expensive. The value bands below the luxury tail carry stronger Hot rates, especially when paired with favorable locations. That is exactly why a listing score should be multivariate: price alone misses the segment context, and segment alone misses the rent/value position.
Model Comparison: A Narrow Win for the Forest
The random forest is the best held-out model in this run, with AUC 0.793. But the logistic regression baseline is close at AUC 0.788. That is a useful result. It says the engineered features are doing much of the work, and the more flexible model adds incremental lift rather than rescuing a weak setup.
The forest wins, but the baseline is close
Feature engineering carries much of the lift; algorithm choice adds a narrower gain.
ROC shows ranking quality, not the business threshold
The operating question is still which slice of listings RentHop should feature.
The score creates an operating queue
On the held-out set, the top decile is more than twice as Hot as the average listing.
For a marketplace, the score-decile chart is usually easier to act on than the ROC curve. The top decile of random-forest scores has a Hot rate of 72.8%, compared with 30.9% overall. That means the model is useful as a ranking system even though no threshold is morally special.
What the Model Leaned On
The winning model leans first on price and value features, then location, then amenities and unit mix. That ranking is inspection, not causation. It says what sorted listings in the historical labels. It does not prove that lowering rent, adding an amenity, or moving a unit to another segment would create a Hot listing.
What the winning model leaned on
Importance is an inspection tool: it tells us what sorted listings, not what would happen if RentHop changed a feature.
The managerial use is disciplined: if price/value and location dominate, RentHop should make sure those features are stable, refreshed, and monitored. If a single amenity unexpectedly dominated, that would be a reason to audit the parsing logic and label definition before shipping.
The Deployment Artifact Is the Queue
The model becomes useful when it produces a queue. In the held-out test set, the top 50 predicted listings have an actual Hot rate of 80% and a median rent of $1,500. This is the concrete product output: "here are the listings the platform should consider featuring first."
Top 50 held-out prospects
This is the deployable artifact: a ranked list with probabilities, not a model score in isolation.
80%
actually Hot
$1,500
median rent
0.821
mean score
The top 50 are not luxury trophy listings. They are mostly lower-rent, one- and two-bedroom listings in high-rate value segments.
The queue concentrates in a few segments
Segment concentration is useful for operations and risky for over-generalization.
| Rank | Listing | Segment | Rent | Beds | Baths | Score | Actual |
|---|---|---|---|---|---|---|---|
| 1 | 122 Gatling Pl | Segment 8Southwest Brooklyn | $1,500 | 2 | 1 | 0.833 | Hot |
| 2 | 463 78th Street | Segment 8Southwest Brooklyn | $1,650 | 2 | 1 | 0.832 | Hot |
| 3 | 835 Bay Ridge Avenue | Segment 8Southwest Brooklyn | $1,750 | 2 | 1 | 0.830 | Hot |
| 4 | 4712 4th Avenue | Segment 8Southwest Brooklyn | $1,700 | 2 | 1 | 0.830 | Hot |
| 5 | 358 47th St | Segment 8Southwest Brooklyn | $1,800 | 2 | 1 | 0.829 | Not |
| 6 | 409 Westervelt Avenue | Segment 8Southwest Brooklyn | $1,750 | 2 | 1 | 0.828 | Hot |
| 7 | 6718 14th Avenue | Segment 8Southwest Brooklyn | $1,550 | 2 | 1 | 0.828 | Hot |
| 8 | 6718 14th Ave #3-R Dyker Heights, Brooklyn, NY 11219 | Segment 8Southwest Brooklyn | $1,600 | 2 | 1 | 0.826 | Hot |
| 9 | 75-32 67th Rd, | Segment 4Central Queens | $1,400 | 1 | 1 | 0.825 | Hot |
| 10 | 521 82nd street | Segment 8Southwest Brooklyn | $1,425 | 1 | 1 | 0.825 | Hot |
| 11 | 644 73rd Street | Segment 8Southwest Brooklyn | $1,435 | 1 | 1 | 0.825 | Not |
| 12 | 68-12 Clyde St | Segment 4Central Queens | $1,200 | 1 | 1 | 0.824 | Hot |
This queue should not be fully automated on day one. A reasonable deployment path is:
- Use the score to create a daily candidate list for editorial or marketplace operations review.
- Track whether featured model-ranked listings get faster renter engagement than comparable non-featured listings.
- Add a threshold-profit or capacity curve once RentHop knows the value of a true positive placement and the cost of a false positive.
What This Case Teaches
The RentHop case connects five ideas from Part IV:
- Task design. The model predicts a listing label so RentHop can rank listings.
- Feature engineering. Coordinates and text become business-readable features.
- Generalization. A 70/30 held-out split grades the ranking before the queue is trusted.
- Model comparison. The random forest wins narrowly, which keeps the baseline honest.
- Deployment framing. The score must turn into a queue, a threshold, and a monitoring loop.