Deep-XPIA tests prompt injection across multi-agent handoffs
Deep-XPIA is an open-source benchmark for cross-prompt injection in multi-agent systems, with live Claude Haiku measurements, a confused-deputy focus, and a clear warning about poisoned tool metadata.
Deep-XPIA is an open-source benchmark for cross-prompt injection that moves through multi-agent delegation chains. The repository describes 300 cases, 8 attack patterns, and 5 defenses, with live Claude Haiku measurements from June 2026. Its core finding is practical: the dangerous point is often the trust boundary where tool metadata or delegated context enters the system, not simply the number of agent hops.
Key takeaways
- Deep-XPIA focuses on confused-deputy failures where one agent carries poisoned context into another boundary.
- The live run reported 69% attack success without defenses and 12% with all wired defenses, while false positives rose to 31%.
- Registry injection was the hardest class in the repository's live notes, especially when poisoned metadata entered before prompt-stream defenses could act.
- The project separates measured live results from simulated baselines, which is useful for teams trying to avoid benchmark theatre.
- A Show HN thread on June 16, 2026 gives the release an independent early-discovery signal, but the technical evidence is in the repository.
Practical LinkLoot angle
Agent teams should use Deep-XPIA as a test-shape, not as a universal scorecard. The benchmark is most useful if your workflow includes tool registries, MCP-style discovery, handoffs between agents, memory, or delegated task execution. It gives you concrete attack patterns to reproduce against your own stack before adding more autonomy.
| Area | What Deep-XPIA helps test | Limitation | Source |
|---|---|---|---|
| Tool metadata | Poisoned descriptions and registry-time injection | Results depend on your actual registry and validator design | GitHub repository |
| Agent handoffs | Whether stripped or rewritten instructions keep malicious intent | The published live run uses Claude Haiku, so model transfer is not guaranteed | GitHub repository |
| Defense stacks | Intent checks, taint tracking, scope tokens, DLP, context budgeting | The repository says some defenses are not fully wired in live mode | GitHub repository |
For a production agent, the first pass is simple: block tool manifests from directly influencing execution policy, store taint metadata with memory values, and require user-visible approval for actions that cross a permission boundary. Then run a small subset of cases against your own orchestration layer and compare failures, not just aggregate scores.
What to verify before you act
Verify the exact commit, dataset version, and live-run settings before citing numbers. Re-run the cases against your target model and framework because the repository's measurements are model-specific. Treat the Show HN thread as launch context only; use the GitHub repository and project page for technical claims.
If you are hardening agent workflows, pair this with LinkLoot's AI agent tools guide. The useful question is where untrusted instructions can enter your agent graph, and whether any later step treats them as authority.
Deep-XPIA is an open-source benchmark for multi-hop cross-prompt injection in multi-agent AI systems.
