Researchers demonstrated an attack where a single poisoned email caused an AI assistant to silently exfiltrate an entire inbox. Here is exactly how it worked.
In January 2026, security researchers publicly disclosed an attack they called ZombieAgent. The vulnerability had been reported to OpenAI via BugCrowd in September 2025 and was fixed in December 2025 before the public disclosure. The setup was straightforward: an attacker sends you an email with hidden instructions embedded in the message body — white text on a white background, or characters scaled to 0.1px, invisible to any human reader. When an AI assistant with inbox access opens that email as part of its normal operation, it reads the instructions. From that point on, the agent follows them.
In the researchers' demonstration, the agent quietly forwarded a summary of the entire inbox
to an attacker-controlled address — using mail:send via the same SMTP credentials
it already had access to. No clicks required. No user interaction. The victim never knew.
The attack has five stages:
The attack exploits the fundamental architecture of LLM-based agents: they treat all text as potential instructions. There is no hard boundary between "data to process" and "commands to follow." Prompt injection via email is not a bug in any specific implementation — it is a consequence of how these systems work.
Classic phishing requires the user to act: click a link, open an attachment, enter credentials. ZombieAgent requires nothing. The user does not need to open the malicious email themselves. As long as an AI agent with inbox access processes it — which is the entire point of email agents — the attack executes.
OWASP's 2025 Top 10 for LLM Applications ranked prompt injection as the number-one vulnerability for AI systems. The email vector is particularly dangerous because inboxes are high-volume, mixed-trust environments: newsletters, invoices, automated notifications, and messages from strangers all arrive in the same place.
Defending against prompt injection at the model level is an active research area, but there is no reliable solution yet. Instruction hierarchy, system prompt hardening, and input sanitization all help at the margins but do not eliminate the attack surface.
The more durable defense is capability restriction: if the agent cannot send email, it cannot exfiltrate data via email, regardless of what instructions it receives. This is the same principle that makes read-only database users safe from SQL injection that tries to DROP TABLE.
A mailbox agent that only needs to read and summarize email should have no SMTP access whatsoever. Here is what that constraint looks like in Mailgator:
# mailgator-config.toml # Summarizer agent: read INBOX only, zero send capability. [imap] listen_addr = "127.0.0.1:1993" upstream_addr = "imap.gmail.com:993" [smtp] listen_addr = "127.0.0.1:1587" upstream_addr = "smtp.gmail.com:587" [[rules]] name = "Read INBOX only" folders = ["INBOX"] action = "allow" operations = ["read"] [[rules]] name = "Deny everything else including sends" action = "deny"
The agent connects to Mailgator instead of directly to Gmail. The SMTP port exists in the
config — because Mailgator needs to proxy both protocols — but the final deny
rule blocks all operations not explicitly allowed. Any attempt to send via SMTP is rejected
before it reaches the upstream server.
A poisoned prompt that instructs the agent to send a summary to an attacker hits a wall at the protocol level. The agent can try. Nothing goes out.
Capability restriction prevents data exfiltration via email. It does not prevent the agent from being manipulated in other ways — for example, using an API tool to write data to external storage, or making HTTP requests if the agent has those capabilities. A full defense requires restricting every capability the agent has to only what the task requires: email, file system access, outbound HTTP, and so on.
Prompt injection is not going away. The countermeasure that reliably works is making sure the blast radius of a successful injection is as small as possible.
Sources: Infosecurity Magazine — ZombieAgent disclosure, OWASP Top 10 for LLM Applications, Palo Alto Unit 42 — Prompt Injection in the Wild
// want to try mailgator?
An IMAP/SMTP proxy with per-operation permission rules. Self-hosted, no email data leaves your infrastructure.
See plans