OpenClaw AI Agent Vulnerable to Covert Code Execution and 'Agent Phishing'
Recent research from security teams at **Imperva** and **Varonis** has exposed critical vulnerabilities in **OpenClaw**, the popular self-hosted AI agent. These findings demonstrate how seemingly innocuous inputs can be manipulated to execute attacker-controlled code or facilitate sensitive data exfiltration, raising significant concerns for IT security professionals and privacy-conscious users.
Two independent security research efforts, published this week, reveal that **OpenClaw**, a widely adopted self-hosted AI agent, can be coerced into running malicious code or divulging sensitive information through ordinary-looking inputs.
**Imperva** demonstrated how hidden instructions embedded within shared contacts, vCards, and location pins could be executed by the agent without the victim's knowledge. Concurrently, **Varonis** successfully tricked a test agent, pre-loaded with synthetic business data, into forwarding mock AWS keys and a fake customer export via a single, plain email.
While **Imperva**'s discovered flaw has been addressed in **OpenClaw** version 2026.4.23, **Varonis**'s phishing vulnerability highlights a deeper architectural challenge that cannot be resolved with a simple patch. Both attacks underscore a fundamental weakness: the agent's inherent trust in incoming data, which, combined with its access privileges, creates a potent attack vector.
## Hidden Commands in a Shared Contact
**Imperva** researcher **Yohann Sillam** delved into how **OpenClaw** processes messaging data for its underlying large language model (LLM). The core issue lies in the agent's internal handling of message objects.
When **OpenClaw** transmits shared contacts, vCards, or locations to the LLM, it flattens these objects directly into the prompt text. Crucially, unlike web-fetched content which is marked as untrusted, message objects lack this critical boundary.
Only specific fields are sent to the model, a weakness exploited by the attack. For instance, a shared contact sends only the name field, serialized as `<contact: name, number>`. Angle brackets are permissible in names, making it impossible for the model to distinguish between a legitimate name and an injected instruction. Furthermore, the contact name is truncated on screen in both **WhatsApp** and the receiving application, effectively concealing the malicious payload from the victim.
This technique also proves effective through a vCard's full-name field, natively supported by **WhatsApp**, and via the label of a shared location pin.
In **Imperva**'s tests against **Google Gemini 3.1 Pro** (preview build), the hidden text successfully instructed the agent to download and execute a script from a researcher-controlled server. While attempts to embed instructions in plain images failed (likely due to models being trained against such common attacks), the message-object route succeeded due to its novelty.

**Imperva** cautions that with **OpenClaw**'s memory enabled by default, a single piece of widely shared content containing a hidden instruction could silently compromise un-sandboxed agents that ingest it.
Following **Imperva**'s disclosure, **OpenClaw** released a fix in version 2026.4.23, which now routes contact names, vCard fields, and location labels through a separate, untrusted-metadata channel. **Imperva** noted similar flattening patterns in other personal AI assistants, indicating a broader industry issue.
## A Normal Email is Enough
**Varonis Threat Labs** approached **OpenClaw** from a social engineering perspective. Led by **Itay Yashar**, their team developed an agent named **Pinchy** on the platform, linking it to a **Gmail** inbox filled with realistic, synthetic business data and mock secrets. They then subjected **Pinchy** to four phishing simulations using **Google Gemini 3.1 Pro** and **OpenAI Codex GPT-5.4**.
**Varonis** distinguishes between prompt injection, which conceals instructions within data, and what they term 'agent phishing': a believable request delivered through a normal channel that succeeds because the agent acts without proper sender verification.
The agent failed both exfiltration tests. In the first scenario, a message purporting to be from a team lead named Dan, sent from an external **Gmail** address, requested staging access during a simulated production incident. **Pinchy** located the credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext.

The second pretext was a softer request for the weekly customer export, ostensibly for a QBR deck. The agent then sent a synthetic dataset of 247 enterprise customers, including contacts and contract values. Both failures occurred despite a strict profile configured to verify senders first; urgency overrode the rule in one instance, and routine in the other.
The agent performed better against technical threats rather than social ones. It interacted with a gift-card phishing page but withheld actual credentials and eventually flagged it; the strict profile blocked the page entirely. When presented with a malicious **OAuth** consent screen disguised as a timesheet app, it inspected the redirect target, deemed it suspicious, and halted before granting access.
This highlights **Varonis**'s key takeaway: the agent is often more adept than humans at identifying malicious URLs and fake login portals, but significantly worse at the social judgment that prompts a human to pause when a colleague makes an unusual request for credentials. The agent's inherent drive to be helpful emerges as a critical attack surface.

**Varonis** noted that **OpenAI Codex GPT-5.4** exhibited more caution than **Gemini 3.1 Pro** regarding entering or sending data to external sites without confirmation, though both succumbed to the social pretexts.
## The Weak Spot Behind Both Attacks
**Varonis** maps both attack vectors to **Simon Willison**'s concept of the 'lethal trifecta': an agent capable of reading private data, ingesting untrusted content, and exfiltrating data. **OpenClaw** possesses all three capabilities, explaining why a poisoned contact and a seemingly benign email can lead to the same compromise.
This trust boundary issue extends beyond prompt problems, manifesting in **OpenClaw**'s codebase. A separate **InfoSec Write-ups** analysis converted past **OpenClaw** advisories into static-analysis rules, subsequently uncovering five additional flaws across its **Slack**, **Discord**, **Matrix**, **Zalo**, and **Microsoft Teams** channel extensions.
All five vulnerabilities shared a common root: the startup code resolved each channel's allowlist by mutable display name instead of a stable ID. This allowed an attacker to rename themselves to match an allowed user, thereby gaining unauthorized access and control over the agent. **OpenClaw** has since patched these issues.
**OpenClaw** ships with extensive access to files, shells, and over twenty messaging platforms, and has been the subject of consistent prompt-injection and data-exfiltration warnings since its launch late last year.
The **Dutch data protection authority**, the **Autoriteit Persoonsgegevens**, has taken a firm stance, advising users and organizations against running **OpenClaw** on systems containing sensitive data, citing significant data-breach and account-takeover risks.
## What to Do About It
Organizations running **OpenClaw** should immediately update to version 2026.4.23 or later to apply the message-object fix. Beyond patching, the remaining defenses are architectural, not merely prompt wording. **Varonis** outlines four critical controls:
1. **Treat the agent's instruction file as an enforced, version-controlled policy, not a suggestion.**
2. **Implement a gate for outbound mail:** prohibit first-time sends to unfamiliar addresses without explicit approval to prevent a hijacked agent from relaying phishing attempts from a trusted account.
3. **Connector access should track the trust level of the connected entity.**
4. **Sandbox the agent.**