§2.1

SQL Like Excel

The regional manager at Bean & Basket has been writing pivot tables in Excel for fifteen years. When she joins the new analytics review, her first instinct is to apologize: she does not know SQL. Twenty minutes later she has written her first SQL query. It is a GROUP BY on product_category with a SUM(revenue), and it does exactly what a pivot table does — she just typed the steps instead of dragging columns into a UI. SQL is not a different way of thinking about data; it is the same Excel thinking, written down. The point of this chapter is to make that translation explicit, so the language barrier stops being an excuse not to look at the source data.

The executive question: how do spreadsheet operations translate into reproducible data operations?

Every familiar Excel action is a data operation with a name. Filtering rows by a date range is WHERE. Picking the columns to keep is SELECT. Sorting by a column is ORDER BY. A pivot table is GROUP BY plus an aggregation like SUM or AVG. A VLOOKUP is a JOIN. Adding a calculated column is an expression in the SELECT clause. A spreadsheet manager has been doing all of these for years; the only thing that changes is that SQL writes the steps out as a recipe, so the same recipe can be re-run next month, audited by a colleague, or version-controlled like code.

This is more than a syntax conversion. The reason matters: a pivot table is a one-time artifact, often built by hand on a laptop, often slightly different from the one the colleague in the next office built. A SQL query is a stored procedure for producing the same answer the same way every time. The first lets you answer this week's question; the second lets you reliably answer the same question next week, next quarter, and next year. As soon as a managerial question needs to be answered repeatedly — what is revenue by city by month? — Excel becomes a fragile place to put it, and SQL becomes the natural place.

The same logic extends to R's dplyr and Python's pandas. Both express the same operations as a sequence of verbs: filter, then mutate, then group, then summarize. The choice between SQL, R, and Python is largely about where the data lives and what other tools you want to use afterwards — not about which one can answer the question. All three can answer it, and all three express it in roughly the same shape. Figure 1 lays out the nine most common Excel actions and their direct equivalents in each language.

Figure 1. Excel ↔ SQL ↔ R ↔ Python. Each row is the same data operation expressed four ways. The right column always says the same thing the left column says; only the syntax changes.
Excel actionSQLR (dplyr)Python (pandas)
Filter rowsWHEREfilter().query()
Select columnsSELECTselect()df[[…]]
SortORDER BYarrange().sort_values()
Pivot table (rows)GROUP BYgroup_by().groupby()
Pivot table (values)SUM / AVG / …summarize().agg()
VLOOKUP / XLOOKUPJOINleft_join().merge()
New calculated columnexpression in SELECTmutate().assign()
Pivot wider / longerPIVOT / UNPIVOTpivot_wider() / pivot_longer().pivot() / .melt()
Remove duplicatesSELECT DISTINCTdistinct().drop_duplicates()

Figure 1 is the entire Rosetta stone. Once a manager can read it, the rest of the chapter is a worked example: one business question, four syntaxes, identical answer.

A worked example: revenue by city and product category

A regional director asks: "For 2025 so far, what is the total revenue, broken down by city and product category, sorted from highest to lowest?" This is exactly the kind of question that lands in an inbox on Monday morning and needs an answer by Friday. Below is the question in four languages. The shape of each answer is identical: a small table with one row per city × product-category combination, ordered by revenue. Only the syntax differs.

  1. Filter the transaction sheet to purchase_date >= 2025-01-01.
  2. Add a calculated column: revenue = price × quantity.
  3. Insert a pivot table. Rows: city, product_category. Values: sum of revenue.
  4. Sort the pivot table by the sum-of-revenue column, descending.

Four mouse-and-keyboard steps. Repeatable only if someone writes them down.

Figure 2. The same business question, expressed four ways. Tab through to compare. The underlying logic — filter, then add a column, then group, then summarize, then sort — is identical across all four; the syntax is the only thing that changes.

Look at Figure 2 in any order. The Excel description and the SQL query describe the same operation at the same level of detail. The R and Python expressions read top-to-bottom as a sequence of verbs — almost word-for-word what an English-speaking analyst would describe. None of these is inherently harder than the others; they are the same recipe written for different audiences. A manager who can sketch the Excel steps can read all four.

The reason this matters for executive practice is not that managers need to write SQL themselves. It is that the analytics team will write SQL, and a manager who reads the four-tab equivalence in Figure 2 can read the team's SQL well enough to ask the right question: "Is the filter date correct?", "Are you grouping at the right grain?", "Why is ORDER BY revenue DESC showing this product on top — did we exclude returns?" Those questions require no syntax knowledge. They require knowing which operation each clause names.