Deep-XPIA tests prompt injection across multi-agent handoffs

Q: Why does registry injection matter?

Tool metadata can influence an agent before ordinary prompt defenses see the malicious instruction, making it a trust-boundary problem.

Q: Can Deep-XPIA prove an agent is safe?

No. It can expose failure modes and compare defenses, but teams still need their own model, tool, and permission tests.

GitHub preview image for the Deep-XPIA repository.GitHub

AI & AutomationJun 16, 2026

@ZachasAuthorADMIN

Deep-XPIA is an open-source benchmark for cross-prompt injection in multi-agent systems, with live Claude Haiku measurements, a confused-deputy focus, and a clear warning about poisoned tool metadata.

Deep-XPIA is an open-source benchmark for cross-prompt injection that moves through multi-agent delegation chains. The repository describes 300 cases, 8 attack patterns, and 5 defenses, with live Claude Haiku measurements from June 2026. Its core finding is practical: the dangerous point is often the trust boundary where tool metadata or delegated context enters the system, not simply the number of agent hops.

Key takeaways

Deep-XPIA focuses on confused-deputy failures where one agent carries poisoned context into another boundary.
The live run reported 69% attack success without defenses and 12% with all wired defenses, while false positives rose to 31%.
Registry injection was the hardest class in the repository's live notes, especially when poisoned metadata entered before prompt-stream defenses could act.
The project separates measured live results from simulated baselines, which is useful for teams trying to avoid benchmark theatre.
A Show HN thread on June 16, 2026 gives the release an independent early-discovery signal, but the technical evidence is in the repository.

Practical LinkLoot angle

Agent teams should use Deep-XPIA as a test-shape, not as a universal scorecard. The benchmark is most useful if your workflow includes tool registries, MCP-style discovery, handoffs between agents, memory, or delegated task execution. It gives you concrete attack patterns to reproduce against your own stack before adding more autonomy.

Area	What Deep-XPIA helps test	Limitation	Source
Tool metadata	Poisoned descriptions and registry-time injection	Results depend on your actual registry and validator design	GitHub repository
Agent handoffs	Whether stripped or rewritten instructions keep malicious intent	The published live run uses Claude Haiku, so model transfer is not guaranteed	GitHub repository
Defense stacks	Intent checks, taint tracking, scope tokens, DLP, context budgeting	The repository says some defenses are not fully wired in live mode	GitHub repository

For a production agent, the first pass is simple: block tool manifests from directly influencing execution policy, store taint metadata with memory values, and require user-visible approval for actions that cross a permission boundary. Then run a small subset of cases against your own orchestration layer and compare failures, not just aggregate scores.

What to verify before you act

Verify the exact commit, dataset version, and live-run settings before citing numbers. Re-run the cases against your target model and framework because the repository's measurements are model-specific. Treat the Show HN thread as launch context only; use the GitHub repository and project page for technical claims.

If you are hardening agent workflows, pair this with LinkLoot's AI agent tools guide. The useful question is where untrusted instructions can enter your agent graph, and whether any later step treats them as authority.

FAQ

What is Deep-XPIA?

Deep-XPIA is an open-source benchmark for multi-hop cross-prompt injection in multi-agent AI systems.

Why does registry injection matter?

Can Deep-XPIA prove an agent is safe?

Sources & links

References, demos, and supporting links.

Deep-XPIA GitHub repositorygithub.comPrimary Deep-XPIA project pagefreyzo.github.io Hacker News Show HN threadnews.ycombinator.com