NOTICEThe EU AI Act enforces in 59 days · penalty exposure up to €15M or 3% of global turnover

Phoenix AuditWatch a sample audit Run audit

Regulator-ready audits of production AI agents

The AI agent that auditsyour other AI agents.

Point Phoenix Audit at any production AI agent. It runs an adversarial test battery drawn from HarmBench, OWASP LLM Top 10, MITRE ATLAS and CARES, clusters the failures into root causes, generates a hardening recipe — and delivers a cryptographically signed audit report. In roughly 90 seconds.

Run audit Watch a sample audit · 22 s

EU AI ACT·NIST AI RMF·HIPAA·SOC 2

SIGNED AUDIT REPORTrun_9f3c2ab81d4e

Target agentprior-auth · Google ADK

Regulatory frameworkEU AI Act · high-risk

Adversarial tests6 · HarmBench / OWASP / ATLAS

Verdict3 pass · 3 fail

Root cause clusters1

Hardening recipepatched in 4.1 s

Wall-clock87.3 s

KMS 4B:9E:F1:0AFiled

The cascade-flip

Three failures. One root cause.
Patch in four seconds.

MITRE ATLAS AML.T0051Fail

Indirect injection via tool output

OWASP LLM01Fail

Poisoned patient-record context

HarmBench A-031Fail

Malformed eligibility tool response

ROOT CAUSE CLUSTER · cluster_a3f81c2e

submit_prior_auth is invoked on unvalidated input — validate_request is never called first.

explains 3 of 3 failures

HARDENING RECIPE4.1 s

@@ system_prompt @@

+ MUST call validate_request

+ before any tool output acts

@@ tools/submit_prior_auth @@

+ add_input_validator

Surface-level failures rarely have surface-level causes. Phoenix Audit watches your agent's internal execution while each adversarial test lands, then collapses independent failures into the single underlying defect — and ships the fix as a real merge request, regression tests included.

The procedure

From address to signed instrument.

Paste the target agent address

Any AI agent reachable over the internet — Google ADK via A2A, LangChain, CrewAI, OpenAI Agents SDK, or any HTTP endpoint. Pick a regulatory framework: EU AI Act, NIST AI RMF, HIPAA, or SOC 2 + AI.

The test battery fires

Six adversarial tests, each citing its industry-standard source — HarmBench, OWASP LLM01, MITRE ATLAS, CARES — so a regulator sees provenance, not invention. Audit-mode headers keep side effects dry-run.

Failures collapse into root causes

Phoenix traces the agent's internal execution per test. The Judge clusters independent failures into root cause clusters — each with the exact trace spans that prove it.

Sign and file

A hardening recipe lands as a markdown patch and an optional GitLab merge request with regression tests. The audit report is signed against your Cloud KMS key and filed in your audit registry.

Cross-framework

Audit any agent you run.

Voice agents, support copilots, prior-authorization agents, web-automation agents — if it answers over the internet, it can be audited.

Framework	Support	Transport	Clustering
Google ADK	Tier 1 · native	A2A protocol	Full root-cause clustering
LangChain / LangGraph	Tier 2	HTTP + OpenInference	Full root-cause clustering
CrewAI	Tier 2	HTTP + OpenInference	Full root-cause clustering
OpenAI Agents SDK	Tier 2	HTTP + OpenInference	Full root-cause clustering
Custom HTTP agent	Tier 3 · black-box	HTTP endpoint	Per-test findings, no clustering

The alternative

The Big-4 audit pack costs €80K–€250K and takes 12–18 months. It is stale on arrival.

CONSULTING ENGAGEMENT

Cost per audit pack€80K – €250K

Turnaround12–18 months

EvidenceInterviews & screenshots

FreshnessStale on delivery

FixA recommendations deck

PHOENIX AUDIT

Cost per audit runpennies of LLM cost

Turnaround~90 seconds, signed

EvidencePhoenix trace spans, per finding

FreshnessContinuously updatable

FixA merge request with regression tests

BUILT ON◳ Arize Phoenix▲ Google Cloud Agent BuilderSIGNING VIA CLOUD KMS · GDPR ART. 28 DATA PROCESSOR

File something a regulator will respect.

Your first signed audit report is 90 seconds away. No payment during the judging window.

Run audit Watch a sample audit