§1.2

Data Structures

The regional manager at Bean & Basket has three slides to prepare for Friday's review. The first asks which store is performing best this quarter. The second asks whether the chain is growing. The third asks where the growth is coming from. She has one folder of data and three reasonable-sounding charts in mind, and none of the three will show all three answers — because the shape of the underlying table changes what the eye can see in the chart. Cross-section, time-series, panel, geo, network: these are not five different files. They are five different ways of arranging the same business reality, and each one closes off some questions while opening up others.

The executive question: what questions can this data structure answer?

In Chapter 1 we worried about what one row means. In this chapter we worry about how those rows are arranged. Data structure is the larger pattern: how the unit of observation combines with time and with relationships to other units. Five structures cover almost every dataset a business analyst will see.

A cross-sectional dataset has one row per unit at a single point in time — a customer survey, a snapshot of stores at the end of last week, a leaderboard of products by year-to-date revenue. It compares units to each other, with time fixed. A time-series has one row per period for a single metric — monthly revenue, weekly active users, daily ad spend. It tracks one thing as it changes, with units collapsed. A panel combines them: the same units, observed repeatedly over time. Store-week sales, customer-month engagement, product-quarter unit movement — the panel is the most informative shape because it lets you ask both who is doing better and who is changing fastest.

Two more structures matter, even though we will not generate synthetic data for them in this chapter. Geo-spatial data attaches location to each row — store coordinates, delivery zones, ZIP codes — and supports questions about distance, catchment, and territory. Network data describes relationships between entities — which customers refer which other customers, which products are bought together, which employees report to whom — and supports questions about influence, communities, and flow.

Figure 1 shows the same eight weeks of Bean & Basket sales arranged three ways. The business question is identical across all three tabs: which store is doing best, and is the chain growing? The data is identical too — 24 store-week observations underneath. Only the arrangement changes. So does the answer.

One row per store, snapshot at week 8 only. Reads as: "Downtown is winning by a wide margin; Suburban is the runner-up; Campus is a distant third." Time has been thrown away.

StoreWeekRevenueTransactions
A — Downtown2024-04-22$145.0019
B — Campus2024-04-22$60.0010
C — Suburban2024-04-22$90.0015
Figure 1. Three structures, one business. Toggle between the panel (24 rows, 3 stores × 8 weeks), the cross-section (just last week's 3 rows), and the time-series (the chain-wide weekly trend). Each view answers the question 'which store is doing best?' differently — only one of them answers it correctly.

The cross-section in Figure 1 looks decisive. Downtown earned $145 last week; Suburban $90; Campus $60. If a regional manager built her Friday slide from this single table, she would walk into the room with a clean ranking and a quiet recommendation to invest more in Downtown. The time-series tells a different and equally tidy story: chain revenue is up. Both stories are true in their narrow way. Both miss the thing that should actually drive the decision.

The panel makes the gap visible. Downtown's eight-week revenue line is flat — it is large, but it is not growing. Campus is bleeding out, slowly, every single week. Suburban started at $35 and ended at $90, nearly tripling. A manager who saw only the cross-section would invest in the wrong store; one who saw only the time-series would feel good about a chain-wide trend that is, in reality, the sum of one collapsing store, one stagnant store, and one breakout. The panel is the only structure of the three that lets you ask and answer the right question: where is the growth coming from? Once that question is on the table, the decision is no longer "give Downtown more marketing money" — it is "figure out what's happening at Suburban, and what's going wrong at Campus."

Geographic and network data follow the same logic. A geo-spatial table that records each store's latitude and longitude lets you ask whether the Suburban breakout has anything to do with proximity to a new university campus three miles away — a question the panel alone cannot answer. A network table that records which customers introduced which other customers lets you ask whether the Suburban growth is driven by a single high-influence customer who brings friends — again, invisible at the store-week level. The deeper rule is that adding structure unlocks questions: each new layer (time, place, relationship) lets the same business evidence be asked something new.