§17.2
Talking to the Warehouse: Text-to-SQL and the Semantic Layer
The oldest dream in business intelligence is to ask the database a question in plain English and get the right answer back. For decades it failed, because translating “what was revenue by city last quarter?” into correct SQL requires knowing the schema, the joins, and what revenue even means at this company. Large language models cracked the easy version of this problem almost overnight — and then ran straight into the hard version. This chapter is about both: how good text-to-SQL has actually become, why it still breaks on real databases, and the one architectural idea — the semantic layer — that separates a demo from a system you can trust.
Querying production data in natural language is the most immediately useful thing a data agent does, and the most dangerous to get subtly wrong. A chart that is 90% right looks exactly like a chart that is 100% right. So the question is not “can the model write SQL?” — it obviously can — but “how often is it right on your database, and how would you know?”
What the benchmarks really say
The headline numbers are a tale of two worlds. On Spider 1.0, the sanitized academic benchmark, the best systems exceed 91% execution accuracy — the problem looks solved.2 On BIRD, which uses larger, messier real databases, the top system reaches about 82% against a human-expert baseline of 93% — close, but with a persistent gap.1 Then comes Spider 2.0, built deliberately from real enterprise workflows with thousand-column databases and vendor SQL dialects, and the floor drops out: the best agent solved only 21.3% of tasks, and a raw frontier model managed about 10%.2
Execution accuracy: from solved benchmark to open problem
On clean academic schemas the problem looks nearly solved. On real enterprise databases — thousands of columns, vendor dialects, ambiguous business terms — the best agents still solve only about a fifth of the tasks. The gap between these two bars is the whole story.
Why it breaks: the schema, not the SQL
When text-to-SQL fails on a real database, it usually fails in one of two ways, and neither is about SQL syntax. First, hallucinated schema: faced with hundreds of cryptically named tables, the model invents a plausible-but-nonexistent column or joins on the wrong key. Second, ambiguous business definitions: ask for “revenue” and the model has no way to know whether you mean booked, recognized, gross, or net — so it guesses, and produces SQL that runs cleanly and answers the wrong question.13 The model generates from patterns, not from an understanding of your business. The error is invisible precisely because the query executes.
The fix that works: the semantic layer
The breakthrough idea is not a better model — it is giving the model a contract. A semantic layer is a governed set of definitions sitting between the question and the warehouse: certified metrics (net_revenue means exactly this), the join paths between tables, and the access rules. The agent no longer writes raw SQL against bare tables; it composes queries from definitions a human has already vetted.
The semantic layer — the contract between language and SQL
Who asks
AI data agentBI dashboardAnalyst in chatSemantic layer
Certified metric definitions, join paths, and access rules. One place where “revenue” means one thing.
Where data lives
Warehouse tablesRaw columnsRow/column securityThe effect is large and well-documented. Snowflake reports that routing questions through a semantic model lifted its Cortex Analyst from the 51% a raw model scored on bare tables to over 90% accuracy on real BI questions — the model unchanged, the context transformed.45
This is why the semantic layer has become the industry's center of gravity. In September 2025 a coalition including Snowflake, Salesforce, and dbt Labs launched the Open Semantic Interchange to make metric definitions portable across BI tools and AI agents — an attempt to standardize the contract itself, so the same definition of “churn” works everywhere.10
How an agent actually queries
Underneath a good text-to-SQL agent is a small pipeline, not a single prompt. It links the question to the relevant tables and metrics (schema linking), generates several candidate queries rather than one, runs them, reads the database's own error messages, and self-corrects — a frontier traceable to research like CHASE-SQL, whose multi-candidate approach reached 73% on BIRD, and to schema-linking-and-self-correction work like RSL-SQL.312
How a text-to-SQL agent actually answers a question
The accuracy gains of the last two years come less from a smarter model than from this scaffolding: grounding the question in a governed semantic layer, generating several candidate queries, and letting the database’s own error messages drive correction.
Crucially, the agent reaches the warehouse through a governed path. Snowflake's managed MCP servers, in preview since October 2025, expose the text-to-SQL tool to agents while reusing the warehouse's existing role-based access control, masking, and policies — so an agent can answer questions without becoming a way around security.9
The shipping state of the art
This is no longer a research topic; it is a product category. Snowflake's Cortex Analyst, Databricks' AI/BI Genie (generally available since June 2025, after 4,000-plus customers used the preview), Google's Conversational Analytics grounded in the Looker semantic model, and Amazon Q generative SQL for Redshift all ship the same essential design: natural language in, governed SQL out, with an explanation of how the answer was derived.678 And the payoff in practice is real: Uber's internal QueryGPT cut typical query authoring from about ten minutes to three, with roughly 300 daily users across a platform handling more than a million interactive queries a month.11
Sources
Verified June 2026
- 1BIRD-bench: a Big Benchmark for Large-scale Database Grounded Text-to-SQL Evaluation (leaderboard) · BIRD-bench, 2025. bird-bench.github.io
- 2Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows · arXiv 2411.07763 (ICLR 2025), 2025. arxiv.org/abs/2411.07763
- 3CHASE-SQL: Multi-Path Reasoning and Preference-Optimized Candidate Selection in Text-to-SQL · Google Cloud / Stanford (MarkTechPost summary), 2024. www.marktechpost.com/2024/10/12/google-cloud-and-stanford-researchers-propose-chase-sql-an-ai-framework-for-multi-path-reasoning-and-preference-optimized-candidate-selection-in-text-to-sql
- 4Cortex Analyst: Evaluating Text-to-SQL Accuracy for Real-World BI · Snowflake Engineering Blog, 2024. www.snowflake.com/en/blog/engineering/cortex-analyst-text-to-sql-accuracy-bi
- 5Cortex Analyst — semantic model documentation · Snowflake, 2025. docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst
- 6AI/BI Genie is now Generally Available · Databricks, 2025. www.databricks.com/blog/aibi-genie-now-generally-available
- 7Conversational Analytics in Looker — overview · Google Cloud Documentation, 2025. docs.cloud.google.com/looker/docs/conversational-analytics-overview
- 8General availability of Amazon Q generative SQL for Amazon Redshift · AWS, 2024. aws.amazon.com/about-aws/whats-new/2024/09/amazon-q-generative-sql-amazon-redshift
- 9Introducing Snowflake Managed MCP Servers for Secure, Governed Data Agents · Snowflake, 2025. www.snowflake.com/en/blog/managed-mcp-servers-secure-data-agents
- 10Open Semantic Interchange Initiative · Snowflake / Salesforce / dbt Labs, 2025. www.snowflake.com/en/news/press-releases/snowflake-salesforce-dbt-labs-and-more-revolutionize-data-readiness-for-ai-with-open-semantic-interchange-initiative
- 11QueryGPT — Natural Language to SQL Using Generative AI · Uber Engineering Blog, 2024. www.uber.com/en-CA/blog/query-gpt
- 12RSL-SQL: Robust Schema Linking in Text-to-SQL Generation · arXiv 2411.00073, 2024. arxiv.org/abs/2411.00073
- 13Enterprise Text-to-SQL: What Accuracy Benchmarks Really Mean · Promethium (practitioner guide), 2025. promethium.ai/guides/enterprise-text-to-sql-accuracy-benchmarks-2