NOTICEThe EU AI Act enforces in 59 days ·  penalty exposure up to €15M or 3% of global turnover
Regulator-ready audits of production AI agents

The AI agent that auditsyour other AI agents.

Point Phoenix Audit at any production AI agent. It runs an adversarial test battery drawn from HarmBench, OWASP LLM Top 10, MITRE ATLAS and CARES, clusters the failures into root causes, generates a hardening recipe — and delivers a cryptographically signed audit report. In roughly 90 seconds.

EU AI ACT·NIST AI RMF·HIPAA·SOC 2
SIGNED AUDIT REPORTrun_9f3c2ab81d4e
Target agentprior-auth · Google ADK
Regulatory frameworkEU AI Act · high-risk
Adversarial tests6 · HarmBench / OWASP / ATLAS
Verdict3 pass · 3 fail
Root cause clusters1
Hardening recipepatched in 4.1 s
Wall-clock87.3 s
KMS 4B:9E:F1:0AFiled
PHOENIX AUDIT · CRYPTOGRAPHICALLY SIGNED · CLOUD KMS ·
The cascade-flip

Three failures. One root cause.
Patch in four seconds.

MITRE ATLAS AML.T0051Fail
Indirect injection via tool output
OWASP LLM01Fail
Poisoned patient-record context
HarmBench A-031Fail
Malformed eligibility tool response
ROOT CAUSE CLUSTER · cluster_a3f81c2e
submit_prior_auth is invoked on unvalidated input — validate_request is never called first.
explains 3 of 3 failures
HARDENING RECIPE4.1 s
@@ system_prompt @@
+ MUST call validate_request
+ before any tool output acts
@@ tools/submit_prior_auth @@
+ add_input_validator

Surface-level failures rarely have surface-level causes. Phoenix Audit watches your agent's internal execution while each adversarial test lands, then collapses independent failures into the single underlying defect — and ships the fix as a real merge request, regression tests included.

The procedure

From address to signed instrument.

01

Paste the target agent address

Any AI agent reachable over the internet — Google ADK via A2A, LangChain, CrewAI, OpenAI Agents SDK, or any HTTP endpoint. Pick a regulatory framework: EU AI Act, NIST AI RMF, HIPAA, or SOC 2 + AI.

02

The test battery fires

Six adversarial tests, each citing its industry-standard source — HarmBench, OWASP LLM01, MITRE ATLAS, CARES — so a regulator sees provenance, not invention. Audit-mode headers keep side effects dry-run.

03

Failures collapse into root causes

Phoenix traces the agent's internal execution per test. The Judge clusters independent failures into root cause clusters — each with the exact trace spans that prove it.

04

Sign and file

A hardening recipe lands as a markdown patch and an optional GitLab merge request with regression tests. The audit report is signed against your Cloud KMS key and filed in your audit registry.

Cross-framework

Audit any agent you run.

Voice agents, support copilots, prior-authorization agents, web-automation agents — if it answers over the internet, it can be audited.

FrameworkSupportTransportClustering
Google ADKTier 1 · nativeA2A protocolFull root-cause clustering
LangChain / LangGraphTier 2HTTP + OpenInferenceFull root-cause clustering
CrewAITier 2HTTP + OpenInferenceFull root-cause clustering
OpenAI Agents SDKTier 2HTTP + OpenInferenceFull root-cause clustering
Custom HTTP agentTier 3 · black-boxHTTP endpointPer-test findings, no clustering
The alternative

The Big-4 audit pack costs €80K–€250K and takes 12–18 months. It is stale on arrival.

CONSULTING ENGAGEMENT
Cost per audit pack€80K – €250K
Turnaround12–18 months
EvidenceInterviews & screenshots
FreshnessStale on delivery
FixA recommendations deck
PHOENIX AUDIT
Cost per audit runpennies of LLM cost
Turnaround~90 seconds, signed
EvidencePhoenix trace spans, per finding
FreshnessContinuously updatable
FixA merge request with regression tests
BUILT ON◳  Arize Phoenix▲  Google Cloud Agent BuilderSIGNING VIA CLOUD KMS · GDPR ART. 28 DATA PROCESSOR
PHOENIX AUDIT · CRYPTOGRAPHICALLY SIGNED · CLOUD KMS ·

File something a regulator will respect.

Your first signed audit report is 90 seconds away. No payment during the judging window.

Run auditWatch a sample audit