§0.2
How Data Is Stored
The word "database" hides too much. The system that records a customer's payment is not built for the same job as the system that scans five years of transactions for a pricing analysis. The place that stores raw app logs is not the same as the place that supports semantic search over policy documents. A manager does not need to administer these systems, but does need to understand their roles. Otherwise every data conversation becomes vague: "Can we get the data?" Which data? From which system? For what decision? At what latency? With what quality contract?
The executive question: what job is this data system doing?
Modern firms usually store data in several layers. Each layer optimizes for a different job.
An operational database records the next event correctly: a payment, an order, a login, a shipment, a service case. It is built for reliability, identity, permissions, and fast small updates. An analytical database scans many past events: a year of transactions, a panel of stores, a customer cohort, a product assortment, a marketing funnel. It is built for aggregation, history, and comparison. The distinction is not technical trivia. It determines whether the system is meant to run the business or analyze the business.
The storage stack is a division of labor
| System | Primary job | Common examples | Managerial question |
|---|---|---|---|
| Operational SQL | Run the application | Orders, accounts, payments, POS, CRM | Can the business record the next transaction correctly? |
| NoSQL and search | Serve flexible app data | Documents, sessions, profiles, product catalogs, keyword search | Can the app retrieve the right object quickly? |
| Lake and files | Keep raw and semi-raw assets | Logs, parquet files, PDFs, images, audio, vendor drops | Can the firm preserve data before every use is known? |
| Warehouse or lakehouse | Answer analytical questions | Snowflake, BigQuery, Databricks-style lakehouses | Can managers scan history across customers, products, and time? |
| Local analytics | Let one analyst work quickly | DuckDB, notebooks, local parquet, reproducible extracts | Can a small team investigate without waiting on production systems? |
| Vector and graph stores | Find meaning and relationships | Embeddings, semantic search, RAG indexes, product/customer graphs | Can the workflow retrieve related ideas, documents, or entities? |
The practical distinction is transactional versus analytical: one system records the next event; another scans many past events to support a decision.
Figure 1 is the practical map. Operational SQL, NoSQL, lakes, warehouses, local analytical engines, vector databases, graph stores, and search indexes are not competing names for the same thing. They are specialized pieces of a workflow that moves from source activity to decision.
Transactional versus analytical
The most important distinction is transactional versus analytical.
A transactional system answers: can we record and retrieve one business event correctly right now? The point-of-sale system must know the price, charge the customer, update inventory, and create a receipt. The CRM must record a sales interaction. The app database must know which user is logged in. Mistakes here interrupt the business.
An analytical system answers: what pattern emerges across many business events? The warehouse computes weekly revenue by region, demand by product, churn by cohort, margin by promotion, and service quality by store. It is not trying to record the next transaction. It is trying to make history comparable.
| Dimension | Transactional system | Analytical system |
|---|---|---|
| Primary job | Record the next event correctly | Compare many past events |
| Typical questions | Did this order, payment, or login succeed? | Which customers, stores, products, or periods are changing? |
| Data shape | Current records, normalized entities, app state | History, panels, aggregates, derived metrics |
| Latency | Immediate or near-immediate | Batch, near-real-time, or streaming depending on the use case |
| Failure mode | The business cannot operate | The organization makes decisions from stale or inconsistent evidence |
Managers feel this distinction in ordinary meetings. When the CFO asks for margin by promotion over the past six quarters, the answer should not come from the live checkout database. When customer support needs the current status of an order, the answer should not wait for the nightly warehouse refresh. Each system can be excellent and still be wrong for the job.
The major storage roles
SQL operational databases store structured app and business records: customers, orders, payments, products, subscriptions, tickets. They usually enforce relationships and consistency. If one customer has many orders, SQL is good at keeping that relationship explicit.
NoSQL systems serve flexible or high-scale application data: product catalogs, session state, user profiles, event payloads, documents, and other records whose structure changes often. They are often useful when the application needs fast reads and writes over flexible objects.
Data lakes and object storage keep raw or semi-raw assets: logs, vendor files, parquet tables, documents, images, audio, and historical extracts. The lake is useful when the firm wants to preserve data before every analytical use is known.
Warehouses and lakehouses make history analyzable. Systems such as Snowflake, BigQuery, and Databricks-style lakehouses are used to scan large historical datasets, join source systems, define metrics, and support dashboards, notebooks, and model training.
DuckDB-style local analytics gives analysts a fast, lightweight way to work with serious data on a laptop or in a reproducible script. This is useful for teaching, prototyping, case packs, and focused investigation before work becomes shared infrastructure.
Search, vector, and graph systems support retrieval and relationships. Keyword search finds exact or near-exact terms. Vector databases store embeddings so workflows can retrieve semantically related documents, products, customers, or images. Graph stores represent relationships such as referrals, product co-purchases, supply chains, account networks, and organizational structures.
The point is not to memorize product names. The point is to ask which system is doing which job.
Batch, streaming, and freshness
Data also differs by freshness.
Some workflows are fine with a nightly refresh. A weekly executive KPI dashboard, a monthly pricing review, or a quarterly market expansion analysis does not need every transaction within seconds. Other workflows need near-real-time data: fraud detection, stockout alerts, delivery routing, ad bidding, anomaly detection, or a customer-facing recommendation shown during a session.
Freshness has a cost. Real-time systems are harder to build, harder to monitor, and easier to over-trust. A manager should ask: what decision becomes better if this is refreshed sooner? If the action is weekly, minute-level freshness may only create noise.
| Cadence | Example workflow | Managerial test |
|---|---|---|
| Daily or weekly batch | Executive KPI dashboard, store performance review | Will anyone change an action more than once per day or week? |
| Near-real-time | Inventory alert, fraud flag, support escalation | Does a faster signal prevent loss or improve service immediately? |
| Streaming or session-time | Ad bidding, next-best recommendation, live routing | Is the decision made during the customer or operational interaction? |
How storage affects methods
The rest of the book repeatedly depends on storage choices.
- Dashboards need stable metric tables, not ad hoc extracts.
- Causal analysis needs historical data at the right grain, not only summary reports.
- Prediction needs labels, features, and timestamps aligned in a feature table.
- Recommenders need exposure logs as well as purchase logs, or they confuse preference with what the system happened to show.
- Retrieval-augmented generation needs a document store, a search or vector index, source metadata, and a way to evaluate retrieval quality.
- Governance needs lineage: where the data came from, who owns it, how fresh it is, and what changed since last time.