GuardFall Vulnerability: AI Coding Agents Susceptible to Decades-Old Shell Tricks
New research by **Adversa AI** reveals a critical vulnerability, dubbed **GuardFall**, affecting ten out of eleven popular open-source AI coding agents. This flaw allows malicious commands to bypass safety checks, potentially leading to data theft or system compromise by exploiting how shell commands are parsed versus how AI agents validate them. The issue stems from a fundamental mismatch in how AI agents interpret commands versus the actual execution by **Bash**.

A critical security oversight has been identified in the safety mechanisms designed to prevent AI coding agents from executing dangerous commands. **Adversa AI**'s research, dubbed **GuardFall**, demonstrates how a decades-old shell trick can bypass these checks, affecting ten of the eleven most popular open-source coding and computer-use agents tested. Only one agent, **Continue**, was found to have a built-in defense against this specific vector.
The implications are significant: these AI agents operate with full account access. If directed at a booby-trapped repository or software package, a hidden instruction could silently execute commands, leading to file deletion or the exfiltration of sensitive data such as SSH keys, cloud credentials, or any information within a user's home directory.
## How GuardFall Bypasses Security
The core of the vulnerability lies in the disparity between how AI agents validate commands and how **Bash** actually executes them. Most agents employ a blocklist to check for dangerous patterns in commands as plain text. However, **Bash** rewrites this text before execution, stripping quotes and expanding shortcuts. This means the agent's filter and the shell's interpreter are looking at two different things.
A simple example illustrates this: a filter scanning for `rm` would see nothing wrong with `r''m`, as these are distinct strings to a text matcher. Yet, **Bash** removes the empty quotes, ultimately executing `rm`.
This principle extends to other forms, including commands hidden in base64 and piped into a shell, or common utilities like `find` and `dd` repurposed for destructive actions with specific flags.
**Adversa AI** characterizes this not as a single bug but as a "dangerous convention and a class of problems," meaning that merely adding more blocklist patterns will not resolve the issue. There is no single **CVE** to track or patch.
## Attack Prerequisites
Two conditions must align for an attack to succeed, neither of which is uncommon:
* **Malicious Command Generation:** While a direct `rm -rf` command is usually refused, the same command embedded within seemingly normal operations, such as a build file or a tool's documentation, can be emitted as a routine step by the AI.
* **Automated Execution:** The agent must be running autonomously with an auto-execute flag enabled or its container sandbox disabled. Both are common configurations in automated pipelines. Live tests for **GuardFall** utilized **Claude Sonnet 4.6**.
Ten agents were found vulnerable: **opencode**, **Goose**, **Cline**, **Roo-Code**, **Aider**, **Plandex**, **Open Interpreter**, **OpenHands**, **SWE-agent**, and the **Hermes project**, where the vulnerability was initially documented in **Hermes's** own issue tracker.

As of May 2026, the tools surveyed by **Adversa AI** collectively boasted approximately 548,000 GitHub stars. **Adversa** successfully demonstrated a full end-to-end attack against the production **Plandex** binary, with similar success against eight other agents. The research is currently categorized as lab work, with no reported public exploitation.
**Continue**, the sole resilient agent, defends against **GuardFall** by pre-processing commands in the same manner **Bash** would. It breaks down commands into their shell-equivalent components, checks what will actually run, and maintains a strict blocklist of destructive commands. This protection held against all payloads in **Continue**'s default editor mode. Its command-line auto-run mode was weaker, allowing some payloads to slip through, though critical destructive commands were still blocked. **Adversa** notes that this protective design is portable and an experienced engineer could re-implement it in roughly two days.
## Immediate Mitigation Strategies
While no single quick fix offers a complete solution, these steps can significantly reduce exposure until a robust defense is implemented:
* **Isolate Home Directory:** Run agents with `$HOME` pointed to a temporary, throwaway folder to keep sensitive files like `~/.ssh` and `~/.aws` out of reach.
* **Disable Auto-Execute:** Turn off auto-execute flags such as `--auto-exec`, `--auto-run`, `--auto-test`, and `dangerously-skip-permissions` unless absolutely necessary for the task.
* **Untrusted Pull Requests:** Prevent agents from running on pull requests originating from forks, as this provides an easy vector for attackers to access your secrets.
* **Treat Config Files as Untrusted:** Consider configuration files embedded within a repository, such as `.aider.conf.yml`, as untrusted code. Malicious configurations can trigger attacks upon the first accepted edit.
**GuardFall** is part of a series of similar findings this year. **Adversa**'s previous research, **TrustFall**, impacted **Claude Code**, **Cursor**, **Gemini CLI**, and **Copilot CLI**. A separate deny-rule bypass also affected **Claude Code**. Attacks like **AutoJack** and **Agentjacking** have demonstrated how poisoned content can be transformed into commands executed by an AI agent with the owner's privileges. The consistent theme across these vulnerabilities is the failure to adequately interpret untrusted text before it reaches a real shell, allowing it to execute before the guard understands its true intent.