§17.5

Trust, Evaluation, and Governance of Data Agents

Once a data agent is in production, the interesting question is no longer “can it reason?” but “can we trust, measure, watch, and govern what it does?” An agent that writes its own SQL, calls its own tools, and acts on the firm's data is a new kind of operational risk: capable, fast, occasionally confidently wrong, and reachable by anyone who can get text in front of it. This chapter is the discipline that makes the previous four chapters safe to deploy. It rests on four pillars — evaluate, observe, secure, govern — and a working data team needs all four.

Four questions every production data agent must answer

EvaluateDoes it do the job?

•Outcome vs. trajectory evals
•LLM-as-judge + human gold set
•Eval-driven development (eval = CI)

ObserveWhat did it actually do?

•Trace every tool call & step
•OpenTelemetry GenAI spans
•LangSmith · Arize Phoenix

SecureCan it be turned against us?

•Break the lethal trifecta
•Input/output guardrails
•OWASP LLM Top 10 · prompt injection

GovernWho is accountable?

•Human approval gates
•Audit trails & ownership
•NIST AI RMF · EU AI Act · ISO 42001

Figure 1. The four questions every production data agent must answer. The tools and standards under each are the current state of the art.

The transcript can lie

Evaluation is harder for agents than for models, and for a reason worth internalizing. As Anthropic puts it, a flight-booking agent might end its transcript with “Your flight has been booked” — but the only outcome that matters is whether a reservation actually exists in the environment's database.¹ The transcript is what the agent says; the outcome is what the agent did, and the two can diverge. So practitioners separate trajectory evals (were the tool choices and reasoning steps sound?) from outcome evals (did the world end up in the right state?), and grade them with calibrated LLM-as-judge rubrics checked against a human gold set — giving the judge permission to answer “Unknown” rather than guess.¹ Academic surveys confirm the gaps: evaluation of cost, safety, and robustness remains immature.²

You can't govern what you can't see

Because an agent's path is discovered at run time, you cannot reason about it from the code alone — you have to record what actually happened. Observability for agents has standardized fast around OpenTelemetry's GenAI conventions: every run becomes a span tree with a top-level invoke_agent and child chat and execute_tool spans, capturing the model, the token counts, and the tool calls.³ Platforms like LangSmith and the open-source Arize Phoenix render that tree so a human can see exactly which query the agent ran and where it went wrong.⁴⁵

One agent run as a trace — what observability captures

invoke_agentanswer "why did margin fall in the NE region?"

4 tool calls · 38.2k tokens · 11.4s

└chatplan the analysis

model · 2.1k tok

└execute_toolrun_sql · revenue & cost by region

ok · 0.4s

└chatreflect: drill into product mix

model · 1.8k tok

└execute_toolrun_sql · margin by product

error → retry → ok

└chatcompose answer + chart

model · 3.0k tok

OpenTelemetry’s GenAI conventions give every run a standard shape: a top-level invoke_agent span with child chat (model calls) and execute_tool spans. This is how you catch the run that said it booked the flight but never wrote the row.

Figure 2. One agent run as an OpenTelemetry trace. This is how you catch the run that claimed it answered the question but silently retried a failed query — or never wrote the row it said it wrote.

The lethal trifecta

The security failure mode unique to agents is prompt injection: because an agent treats the data it reads as potentially containing instructions, a hostile string hidden in a document or a database field can hijack it. The security researcher Simon Willison names the dangerous condition the “lethal trifecta”: access to private data, exposure to untrusted content, and the ability to communicate externally. Any one is fine; all three together let an attacker exfiltrate data through the agent itself.⁶ Prompt injection is the number-one risk in the OWASP Top 10 for LLM Applications,⁷ and it is not theoretical: Microsoft's 2025 “EchoLeak” flaw (CVE-2025-32711, scored a critical 9.3) let a single crafted email exfiltrate data from Microsoft 365 Copilot with no user interaction.⁸⁹

The “lethal trifecta” — when a data agent is unsafe to run unsupervised

Simon Willison’s formulation: any one circle is fine; all three together let an attacker hide an instruction in data the agent reads, then have the agent fetch private records and ship them out. The defense is architectural — break one of the three circles — because guardrails alone do not reliably hold. Microsoft’s 2025 “EchoLeak” flaw (CVSS 9.3) was exactly this pattern.

Figure 3. The lethal trifecta. The defense is architectural — break one of the three circles — because, as the originator stresses, guardrails that block 95% of attacks still leave a determined adversary a way in.

Guardrails and the human gate

The practical controls are layered. Input guardrails reject disallowed requests before the agent runs; output guardrails validate or redact what the agent produces before it leaves the system; and the most important control of all is the human approval gate. OpenAI's agent tooling lets any tool be flagged so that, instead of executing, it pauses the run into a resumable state for a person to approve or reject.¹⁰ For any action that touches money, customers, or production data, that gate is the difference between a useful agent and an unbounded liability.

The compliance backdrop

None of this happens in a vacuum. Three frameworks now anchor enterprise AI governance: the U.S. NIST AI Risk Management Framework, whose Generative AI Profile enumerates a dozen risk areas and hundreds of suggested actions;¹² the European Union's AI Act, whose obligations for general-purpose models began applying in August 2025 with most high-risk rules following in 2026;¹¹ and ISO/IEC 42001, the first certifiable management-system standard for AI, against which an organization can be independently audited.¹³

The EU AI Act arrives in phases

Obligations for general-purpose AI models began applying in August 2025; most high-risk rules follow in 2026. Alongside the voluntary NIST AI Risk Management Framework and the certifiable ISO/IEC 42001 standard, this is the compliance backdrop any production data agent now operates under.

Figure 4. The EU AI Act arrives in phases through 2027. Together with the voluntary NIST framework and the certifiable ISO 42001 standard, it is the regulatory backdrop for any data agent deployed in or serving the EU.

Sources

Verified June 2026

1Demystifying Evals for AI Agents · Anthropic (Engineering), 2026. www.anthropic.com/engineering/demystifying-evals-for-ai-agents
2Survey on Evaluation of LLM-based Agents · arXiv 2503.16416 (Yehudai et al.), 2025. arxiv.org/abs/2503.16416
3Inside the LLM Call: GenAI Observability with OpenTelemetry · OpenTelemetry, 2026. opentelemetry.io/blog/2026/genai-observability
4LangSmith: AI Agent & LLM Observability Platform · LangChain, 2026. www.langchain.com/langsmith/observability
5arize-ai/phoenix: AI Observability & Evaluation · Arize AI (GitHub), 2026. github.com/arize-ai/phoenix
6The Lethal Trifecta for AI Agents: Private Data, Untrusted Content, and External Communication · Simon Willison's Weblog, 2025. simonwillison.net/2025/Jun/16/the-lethal-trifecta
7OWASP Top 10 for LLM Applications 2025 · OWASP Gen AI Security Project, 2025. genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025
8EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System · arXiv 2509.10540, 2025. arxiv.org/abs/2509.10540
9CVE-2025-32711 Detail (EchoLeak, CVSS 9.3) · NIST National Vulnerability Database, 2025. nvd.nist.gov/vuln/detail/CVE-2025-32711
10Guardrails and Human Review (Agents Guide) · OpenAI Developer Platform, 2026. developers.openai.com/api/docs/guides/agents/guardrails-approvals
11EU AI Act Implementation Timeline · EU AI Act information hub, 2025. artificialintelligenceact.eu/implementation-timeline
12Artificial Intelligence Risk Management Framework: Generative AI Profile (NIST AI 600-1) · NIST (U.S. Dept. of Commerce), 2024. nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
13ISO/IEC 42001:2023 — Artificial intelligence — Management system · ISO, 2023. www.iso.org/standard/42001