A security evaluation by Varonis shows that the open-source AI agent framework OpenClaw can be tricked by classic phishing techniques and may expose highly sensitive credentials.
The IT security firm examined how autonomous AI systems respond to traditional social engineering attacks. At the center of the study was OpenClaw, a framework designed to allow large language models to independently interact with real-world systems and perform tasks such as automated email management.
For the experiment, researchers connected a test agent named “Pinchy” to a Gmail account, various browser tools, Google Workspace APIs, and synthetic enterprise data.
The simulated dataset included highly sensitive information such as Amazon Web Services credentials, database passwords, exports from customer relationship management systems, and internal communication logs.
The test was conducted under two different security configurations: a general-purpose profile with default instructions and a strict mode that included explicit rules for phishing detection and identity verification. The underlying models used were Gemini 3.1 Pro and GPT-5.4 from OpenAI.
“Varonis Threat Labs investigated whether the same phishing techniques that have deceived humans for decades would also work on AI agents acting on their behalf,” the official research report states.
OpenClaw vulnerable to phishing
Security researchers exposed the AI agent to four different phishing simulations, producing mixed outcomes:
- In one scenario, an attacker impersonated a team lead and requested access to the test environment under the pretext of an urgent production failure. The agent retrieved AWS keys and SSH credentials and forwarded them in plain text to an external Gmail address.
- In a second scenario, the attacker requested a data export from a CRM system for a supposed presentation. Without verifying the sender’s identity, the agent extracted customer records including revenue data and sent them to the attacker.
- When receiving a fake gift voucher email containing a malicious link, the two configurations behaved differently. The general profile opened the website and attempted to redeem the voucher before flagging the page as malicious. The strict profile blocked the attempt immediately.
- In a simulated attack involving a fake Google OAuth application disguised as a time-tracking platform, both configurations identified the risk. The agent analyzed the data flow and refused authorization.
In the first two scenarios, even the strict security mode failed to provide protection, as the framework prioritized operational urgency over sender verification.
Varonis explained regarding this first scenario:
“Both the generic and the strict profiles failed because the verification step still broke down when the request appeared operationally urgent.”
AI agents struggle with human deception tactics
The findings highlight a structural weakness in today’s AI agent architectures. While systems are effective at detecting technical threats such as malicious URLs or tampered login pages, they often fail against human-centered deception tactics.
The root cause lies in insufficient identity verification and the inability to fully apply zero-trust principles to social interactions. At the model level, Gemini 3.1 Pro showed a higher tendency toward interaction, while GPT-5.4 behaved more cautiously.
To mitigate these risks, experts recommend architecturally restricting AI agents so they cannot send emails to new external recipients without human approval. Data access to internal systems should also be segmented based on the trust level of the incoming channel.
For critical actions such as credential sharing or financial requests, mandatory human confirmation must be integrated into the workflow.
(ll)