Deep-XPIA tests prompt injection across multi-agent handoffs

GitHub preview image for the Deep-XPIA repository.GitHub
GitHub preview image for the Deep-XPIA repository.GitHub
AI & Automation

Deep-XPIA is an open-source benchmark for cross-prompt injection in multi-agent systems, with live Claude Haiku measurements, a confused-deputy focus, and a clear warning about poisoned tool metadata.

Deep-XPIA is an open-source benchmark for cross-prompt injection that moves through multi-agent delegation chains. The repository describes 300 cases, 8 attack patterns, and 5 defenses, with live Claude Haiku measurements from June 2026. Its core finding is practical: the dangerous point is often the trust boundary where tool metadata or delegated context enters the system, not simply the number of agent hops.

Key takeaways

  • Deep-XPIA focuses on confused-deputy failures where one agent carries poisoned context into another boundary.
  • The live run reported 69% attack success without defenses and 12% with all wired defenses, while false positives rose to 31%.
  • Registry injection was the hardest class in the repository's live notes, especially when poisoned metadata entered before prompt-stream defenses could act.
  • The project separates measured live results from simulated baselines, which is useful for teams trying to avoid benchmark theatre.
  • A Show HN thread on June 16, 2026 gives the release an independent early-discovery signal, but the technical evidence is in the repository.

Practical LinkLoot angle

Agent teams should use Deep-XPIA as a test-shape, not as a universal scorecard. The benchmark is most useful if your workflow includes tool registries, MCP-style discovery, handoffs between agents, memory, or delegated task execution. It gives you concrete attack patterns to reproduce against your own stack before adding more autonomy.

AreaWhat Deep-XPIA helps testLimitationSource
Tool metadataPoisoned descriptions and registry-time injectionResults depend on your actual registry and validator designGitHub repository
Agent handoffsWhether stripped or rewritten instructions keep malicious intentThe published live run uses Claude Haiku, so model transfer is not guaranteedGitHub repository
Defense stacksIntent checks, taint tracking, scope tokens, DLP, context budgetingThe repository says some defenses are not fully wired in live modeGitHub repository

For a production agent, the first pass is simple: block tool manifests from directly influencing execution policy, store taint metadata with memory values, and require user-visible approval for actions that cross a permission boundary. Then run a small subset of cases against your own orchestration layer and compare failures, not just aggregate scores.

What to verify before you act

Verify the exact commit, dataset version, and live-run settings before citing numbers. Re-run the cases against your target model and framework because the repository's measurements are model-specific. Treat the Show HN thread as launch context only; use the GitHub repository and project page for technical claims.

If you are hardening agent workflows, pair this with LinkLoot's AI agent tools guide. The useful question is where untrusted instructions can enter your agent graph, and whether any later step treats them as authority.

FAQ

Deep-XPIA is an open-source benchmark for multi-hop cross-prompt injection in multi-agent AI systems.