§17.5
Trust, Evaluation, and Governance of Data Agents
Once a data agent is in production, the interesting question is no longer “can it reason?” but “can we trust, measure, watch, and govern what it does?” An agent that writes its own SQL, calls its own tools, and acts on the firm's data is a new kind of operational risk: capable, fast, occasionally confidently wrong, and reachable by anyone who can get text in front of it. This chapter is the discipline that makes the previous four chapters safe to deploy. It rests on four pillars — evaluate, observe, secure, govern — and a working data team needs all four.
Four questions every production data agent must answer
- •Outcome vs. trajectory evals
- •LLM-as-judge + human gold set
- •Eval-driven development (eval = CI)
- •Trace every tool call & step
- •OpenTelemetry GenAI spans
- •LangSmith · Arize Phoenix
- •Break the lethal trifecta
- •Input/output guardrails
- •OWASP LLM Top 10 · prompt injection
- •Human approval gates
- •Audit trails & ownership
- •NIST AI RMF · EU AI Act · ISO 42001
The transcript can lie
Evaluation is harder for agents than for models, and for a reason worth internalizing. As Anthropic puts it, a flight-booking agent might end its transcript with “Your flight has been booked” — but the only outcome that matters is whether a reservation actually exists in the environment's database.1 The transcript is what the agent says; the outcome is what the agent did, and the two can diverge. So practitioners separate trajectory evals (were the tool choices and reasoning steps sound?) from outcome evals (did the world end up in the right state?), and grade them with calibrated LLM-as-judge rubrics checked against a human gold set — giving the judge permission to answer “Unknown” rather than guess.1 Academic surveys confirm the gaps: evaluation of cost, safety, and robustness remains immature.2
You can't govern what you can't see
Because an agent's path is discovered at run time, you cannot reason about it from the code alone — you have to record what actually happened. Observability for agents has standardized fast around OpenTelemetry's GenAI conventions: every run becomes a span tree with a top-level invoke_agent and child chat and execute_tool spans, capturing the model, the token counts, and the tool calls.3 Platforms like LangSmith and the open-source Arize Phoenix render that tree so a human can see exactly which query the agent ran and where it went wrong.45
One agent run as a trace — what observability captures
OpenTelemetry’s GenAI conventions give every run a standard shape: a top-level invoke_agent span with child chat (model calls) and execute_tool spans. This is how you catch the run that said it booked the flight but never wrote the row.
The lethal trifecta
The security failure mode unique to agents is prompt injection: because an agent treats the data it reads as potentially containing instructions, a hostile string hidden in a document or a database field can hijack it. The security researcher Simon Willison names the dangerous condition the “lethal trifecta”: access to private data, exposure to untrusted content, and the ability to communicate externally. Any one is fine; all three together let an attacker exfiltrate data through the agent itself.6 Prompt injection is the number-one risk in the OWASP Top 10 for LLM Applications,7 and it is not theoretical: Microsoft's 2025 “EchoLeak” flaw (CVE-2025-32711, scored a critical 9.3) let a single crafted email exfiltrate data from Microsoft 365 Copilot with no user interaction.89
The “lethal trifecta” — when a data agent is unsafe to run unsupervised
Simon Willison’s formulation: any one circle is fine; all three together let an attacker hide an instruction in data the agent reads, then have the agent fetch private records and ship them out. The defense is architectural — break one of the three circles — because guardrails alone do not reliably hold. Microsoft’s 2025 “EchoLeak” flaw (CVSS 9.3) was exactly this pattern.
Guardrails and the human gate
The practical controls are layered. Input guardrails reject disallowed requests before the agent runs; output guardrails validate or redact what the agent produces before it leaves the system; and the most important control of all is the human approval gate. OpenAI's agent tooling lets any tool be flagged so that, instead of executing, it pauses the run into a resumable state for a person to approve or reject.10 For any action that touches money, customers, or production data, that gate is the difference between a useful agent and an unbounded liability.
The compliance backdrop
None of this happens in a vacuum. Three frameworks now anchor enterprise AI governance: the U.S. NIST AI Risk Management Framework, whose Generative AI Profile enumerates a dozen risk areas and hundreds of suggested actions;12 the European Union's AI Act, whose obligations for general-purpose models began applying in August 2025 with most high-risk rules following in 2026;11 and ISO/IEC 42001, the first certifiable management-system standard for AI, against which an organization can be independently audited.13
The EU AI Act arrives in phases
Obligations for general-purpose AI models began applying in August 2025; most high-risk rules follow in 2026. Alongside the voluntary NIST AI Risk Management Framework and the certifiable ISO/IEC 42001 standard, this is the compliance backdrop any production data agent now operates under.
Sources
Verified June 2026
- 1Demystifying Evals for AI Agents · Anthropic (Engineering), 2026. www.anthropic.com/engineering/demystifying-evals-for-ai-agents
- 2Survey on Evaluation of LLM-based Agents · arXiv 2503.16416 (Yehudai et al.), 2025. arxiv.org/abs/2503.16416
- 3Inside the LLM Call: GenAI Observability with OpenTelemetry · OpenTelemetry, 2026. opentelemetry.io/blog/2026/genai-observability
- 4LangSmith: AI Agent & LLM Observability Platform · LangChain, 2026. www.langchain.com/langsmith/observability
- 5arize-ai/phoenix: AI Observability & Evaluation · Arize AI (GitHub), 2026. github.com/arize-ai/phoenix
- 6The Lethal Trifecta for AI Agents: Private Data, Untrusted Content, and External Communication · Simon Willison's Weblog, 2025. simonwillison.net/2025/Jun/16/the-lethal-trifecta
- 7OWASP Top 10 for LLM Applications 2025 · OWASP Gen AI Security Project, 2025. genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025
- 8EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System · arXiv 2509.10540, 2025. arxiv.org/abs/2509.10540
- 9CVE-2025-32711 Detail (EchoLeak, CVSS 9.3) · NIST National Vulnerability Database, 2025. nvd.nist.gov/vuln/detail/CVE-2025-32711
- 10Guardrails and Human Review (Agents Guide) · OpenAI Developer Platform, 2026. developers.openai.com/api/docs/guides/agents/guardrails-approvals
- 11EU AI Act Implementation Timeline · EU AI Act information hub, 2025. artificialintelligenceact.eu/implementation-timeline
- 12Artificial Intelligence Risk Management Framework: Generative AI Profile (NIST AI 600-1) · NIST (U.S. Dept. of Commerce), 2024. nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
- 13ISO/IEC 42001:2023 — Artificial intelligence — Management system · ISO, 2023. www.iso.org/standard/42001