
A strange thing happens when teams add AI agents to real software workflows. At first, the agent looks like a better chatbot. It answers questions, summarizes tickets, drafts replies, and searches documentation. The security model feels familiar. Keep the prompt clean, filter bad inputs, redact sensitive data, add guardrails, and log conversations. Then someone gives the agent a tool. Now it can reset a password, update an email address, open a support case, issue a refund, query a database, call an internal API, trigger a deployment, or invoke an MCP server. At that point, it is no longer just producing text. It is acting inside your system. That is where a lot of AI security conversations become too narrow. We keep asking whether the model can be tricked. That matters. But once an agent can act, the more important question is what the surrounding system allows to happen after the model produces an instruction. The recent Meta AI support incident is a useful example, but only if we are precise about it. According to reporting from The Verge , Meta said up to 20,225 Instagram accounts may have been affected by an exploit involving its AI support tool. That number should not be read as a confirmed count of compromised accounts. Meta said some reset requests in that set may have been legitimate. The confirmed issue was not that prompt injection made a model authorize attackers. The reporting describes abuse of an AI-assisted account recovery flow, combined with a code path that failed to verify whether the email address requesting a reset matched the email attached to the Instagram account. KrebsOnSecurity also described attackers sharing instructions for using Meta’s AI support assistant during the recovery process. So the clean lesson is not “AI agents approved access.” The better lesson is that AI-assisted workflows need deterministic checks before sensitive actions run. Email matching, argument validation, business rules, authorization, and audit logging cannot be left to conversational flow. The risk starts when the agent can act A chatbot that gives a wrong answer can cause damage. But the blast radius is usually bounded by the fact that the chatbot only emits text. An agent with tools has a different risk profile. A support assistant with a password-reset tool does not just explain recovery. It can start recovery. An internal engineering assistant with deployment access does not just explain the release pipeline. It can trigger it. A finance agent connected to company systems does not just summarize invoices. It may be able to approve, edit, or export them. That is the architectural shift. The LLM is a probabilistic model. The planning, tool selection, routing, memory, and execution logic usually live in the agent system around it. That whole system can be useful for interpreting intent, but it should not be the sole or final authority for sensitive actions. This is close to the risk OWASP describes in its 2025 guidance on Excessive Agency , where LLM-based systems are given too much functionality, too many permissions, or too much autonomy. Prompt injection can trigger this failure class. So can hallucination, ambiguous instructions, confusing multi-agent handoffs, broken validation, stale context, and ordinary application bugs. The root issue is not always the text that fooled the model. Often, the deeper issue is that the system gave an AI-assisted workflow enough authority to make a bad step matter. Authentication is not authorization A common mistake in agent architectures is treating authentication as if it solves the hard part. \n The user is logged in. The agent has an API key. The MCP server runs inside trusted infrastructure. The gateway knows which application made the request. Good. None of that answers the real question. Should this user, acting through this agent, perform this action, on this resource, with these arguments, under this context, right now? A support agent may be allowed to issue refunds. That does not mean it should issue a $2,000 refund on a ticket assigned to someone else. An engineering assistant may be allowed to read logs. That does not mean it should export production logs containing user emails. A recovery assistant may be allowed to help with password resets. That does not mean it should send a reset link to a new email address without deterministic verification. Some of these checks are authorization. Some are validation. Some are application logic. Some are business invariants. They are related, but they are not the same thing. That distinction matters. A policy engine should not be presented as a magic shield for every workflow bug. It can answer whether a principal may perform an action on a resource under known conditions. It does not replace input validation, email matching, fraud checks, state-machine correctness, or well-designed recovery flows. The practical point is simpler. Sensitive actions need deterministic controls outside the model response. The agent requests, policy decides, the tool executes only if allowed A useful agent security model starts with a boring flow. user identity agent identity action resource arguments business context policy decision allow or deny if allowed tool executes audit log records who requested the action which agent was involved which resource was targeted which arguments were used why the decision was allowed or denied That is the model teams should aim for. The agent can interpret the user’s intent and request an action. A deterministic policy or application control decides whether the action is allowed. The tool executes only after that decision. The result is logged in a way that can be audited later. \n This sounds less exciting than autonomous agents. That is the point. Sensitive operations should be boring. A tool call can look coherent in logs and still be unauthorized. For sensitive actions, the system needs something outside the conversation that can say no. Gateways help, but they should not be the only enforcement point AI gateways are becoming a natural place to add controls. If agent traffic goes through a shared layer, that layer can authenticate callers, route model requests, apply rate limits, restrict model access, filter tool exposure, and log activity. \n That is useful infrastructure. It is not enough if the gateway becomes the only place where sensitive actions are checked. \n Gateway-level checks can reduce risk before a request reaches the tool. But the final enforcement should happen as close as possible to the tool or resource, immediately before the action runs. Otherwise the gateway becomes a single chokepoint that the rest of the system silently trusts. If another path reaches the tool, if a service calls the tool directly, if a workflow bypasses the gateway, or if an MCP server exposes the same capability elsewhere, the control can disappear. Tool hiding is useful, but it is not an authorization boundary. If the model never sees the refund tool, it is less likely to call it. That reduces risk. It does not prove the refund action is protected. Every sensitive tool invocation still needs a check at execution time. MCP makes this more urgent The Model Context Protocol makes tool access more standardized and composable. That is good for developer experience, and it makes authorization more important. If an MCP server exposes tools for CRM records, cloud resources, internal documents, account recovery, or customer operations, then its enforcement model matters as much as its API design. \n The risky pattern is giving the MCP server a broad service account and letting agents act through it. Downstream systems then see the server, not the human user, agent, tenant, or workflow behind the request. The chain of identity becomes blurry. \n The safer pattern is contextual enforcement. The MCP server or underlying tool should receive enough context to make a decision, including user identity, agent identity, action, resource, arguments, tenant, environment, and relevant business state. The action should execute only after the check succeeds. Cerbos has written about this pattern in its guide to MCP authorization , and also showed how a gateway can use policy checks for model access, tool visibility, and argument-level MCP tool calls in a LiteLLM integration . The useful idea is not vendor-specific. Policy should live outside the model, and sensitive tools should enforce decisions before execution. A minimum security model for tool-using agents If an agent can touch real systems, “we added a system prompt” is not enough. A reasonable baseline is smaller and more concrete. Give agents scoped identities. Do not let every agent act through the same broad service account. Carry user context through the workflow. Downstream tools should know who is being represented, not just which server called them. Minimize tool access. Expose only the tools needed for the task. Validate arguments. A password-reset email, refund amount, customer ID, tenant ID, file path, or deployment target must be checked deterministically. Authorize sensitive invocations. The check should happen outside the model and as close as possible to the tool or resource. Bind actions to resource ownership and business state. A user may update their own record, not someone else’s. A support rep may act only on assigned cases. A workflow may draft an action without being allowed to execute it. Require human review for high-impact operations. This matters for money movement, account recovery, data export, permission changes, destructive actions, and production changes. Log decisions, not just conversations. You need to know who attempted what, through which agent, against which resource, with which arguments, and why it was allowed or denied. Fail closed. If context is missing, policy is unavailable, or the request is ambiguous, the sensitive action should not proceed. \n Prompt-injection defenses, input filtering, output validation, model evaluation, and secure workflow design still matter. They just should not be the only thing standing between a conversational mistake and a real-world action. The conclusion AI agents are useful because they turn messy human intent into software actions. That is also why they need stronger boundaries than chatbots. The Meta incident does not prove that an AI model made an authorization decision. It does show why AI-assisted workflows need deterministic checks around sensitive operations. A normal validation bug can become more dangerous when natural language becomes the front door to account recovery, refunds, customer data, internal systems, or operational tools. The question is not whether the agent can be made perfect. The question is whether the system around the agent can verify, constrain, deny, and audit what the agent asks to do. \n If your agent can reset passwords, move money, export data, modify permissions, approve refunds, update production systems, or call powerful MCP tools, then it is no longer just a chatbot. It is an actor in your architecture. Actors need controls. LLM output must not be the sole or final authority for sensitive actions. The agent requests. Policy and application checks decide. The tool executes only if allowed. \
View original source — Hacker Noon ↗



