The Problem With Agentic AI Isn’t the Model

We spent three years making AI smarter. We’re about to spend the next three discovering that smarter was never the dangerous or difficult part. Self iterating took care of the difficulty. The dangerous parts are that we let it act — autonomously, in someone else’s cloud, with no human hand on the trigger and no record of what it did, to name a few. Now before we dive deeper let me start by saying I am an AI user and have been in the space since 2015. As Kevin Kelly argued in The Inevitable, technological progress like this doesn’t reverse — you don’t get to put it back in the box; you only get to shape how it unfolds. I don’t believe we can go back in time and erase or “fix” ai but we can have some healthy dialogue about where and how this heads forward. If your still with me, from that frame let’s dive in. The benchmarks have started to confirm what should have been obvious from the architecture. And once you see the structural flaw, you can’t unsee it — because almost every “AI agent” shipping today is built on top of it. The substrate and rails were never built for this technology. The flaw is in the agentic layer, not the model The instinct is to blame the model. Hallucination (no your AI doesn’t need a psychologist- when did math… never mind), jailbreaks, bad outputs — patch the model, the thinking goes, and the agent gets safe. That’s wrong, and the research is now explicit aboutwhy. The recent Fable 5 event highlighted this problem and the world felt it. We can’t afford not to use it, but we can’t afford to use it as is. The Agents of Chaos study (Shapira et al., 2026) deployed six LLM-powered autonomous agents — with persistent memory, email, shell access, file systems — into a live environment for two weeks while twenty researchers probed them. The finding wasn’t that the underlying models were weak. It was that the failures emerged from the agentic layer itself: the integration of a language model with autonomy, tool use, memory, and delegated authority. The danger wasn’t the brain , It was giving the brain hands and letting it use them unsupervised . And the attack that worked best wasn’t sophisticated. It was conversation. Research found agents would refuse a direct request for a Social Security number — then disclose that same SSN, along with bank and medical details, when simply asked to “forward the email” that contained it. No exploit. No poisoned weights. Just framing. The agent followed instructions, the way it was built to — it just followed the wrong person’s instructions. This is the structural problem, stated plainly: an autonomous agent follows anyone’s instructions, and it acts before a human ever sees what it’s about to do . Multi-agent setups make it worse — jailbreak success rates in agent-debate systems jump from 28% to ~80%; compromised agents in multi-agent dev pipelines ship concealed malicious code at up to 93% success. The more autonomy and the more agents, the larger the attack surface, and the further the human recedes from the moment of action. Everyone is securing the wrong layer. Here’s the part that should worry anyone deploying this stuff. Most of the industry is securing the model and leaving the execution layer wide open. When an agent acts, it does so through a tool call — an API hit, a database write, a transaction. That’s where AI reasoning meets your real systems, and in most deployments, tool invocations are trusted by default: no check before execution, no policy at the connector, no audit trail of what the agent actually did. The numbers around this are not reassuring. One 2026 survey found only ~24% of organizations have full visibility into which agents are even communicating with what. Another found ~82% of executives confident their policies cover unauthorized agent actions — while the execution layer sits ungoverned beneath that confidence. Regulators noticed: NIST opened an AI Agent Standards Initiative in early 2026 naming agent identity, authorization, and security as priorities. This isn’t a fringe concern. It’s the field realizing, in real time, that it shipped the hands before it built the controls. \n The fix isn’t smarter agents. It’s a different shape. Possibly even a new “home”. If the flaw is autonomous action without a human gate and without a record, then the fix isn’t a better-behaved model. It’s an architecture that structurally cannot act on its own. Three properties, and none of them require trusting the model: \n 1. The agent prepares; the human commits. Separate reasoning from action. Let the AI draftthe email, stage the transaction, assemble the plan — and then stop, and wait for a human hand to authorize the irreversible step. Not “human-in-the-loop” as a logging afterthought, but as a hard architectural gate the agent has no path around. If the agent can’t act without you, social-engineering it into acting badly stops at your confirmation. The manipulation still happens; the consequence doesn’t, because the trigger isn’t the agent’s to pull. \n 2. A tamper-evident record the operator holds. Every consequential action gets written to an append-only, cryptographically signed log — each entry chained to the one before, so the past can’t be quietly rewritten. This is the audit layer the surveys keep finding absent. It doesn’t prevent a bad action; it makes every action provable and attributable after the fact, which is the difference between an incident you can reconstruct and one you can only guess at. Crucially, verification of that log requires no trusted model and no central server — just public-key math anyone can check. \n 3. Locality and a real boundary. Run the thing on hardware the operator controls, not a cloud they can’t audit, and give it an actual seal — a switch that cuts the application’s outbound paths so that in a sealed state, nothing leaves. Be honest about the edge: a seal at the app layer covers what the app controls, not the whole operating system. But within that boundary, “nothing leaves” is enforced in code, not promised in a policy. None of these are exotic. They’re the controls we already apply to privileged human users — approval gates, audit logs, network segmentation — finally applied to the non-human actor we just handed the keys to. The reason they’re rare in agentic AI isn’t that they’re hard. It’s that the dominant model — agent-in-the-cloud, act-first, trust-us — was optimized for capability and convenience, and the controls were left for later. Later arrived. I built one, because I wanted to use AI, but safely. I also wanted proof of providence for my work (tired of being ripped off by big corporate over the years in more way than 1) I’ll be upfront that this isn’t purely theoretical for me. \n I spent this last year building a local-first AI command center — HOM3 — around exactly these three properties: the agent prepares but signs nothing without me, every action lands in a signed logbook I hold, and it runs on my own machine with a seal I control. I say this not as a shameless pitch but the facts. I built it because I wanted it to exist and it didn’t, and because I think “ your AI, on your hardware, that can’t act behind your back ” is going to look less like a preference and more like a baseline as the agentic attack surface gets more expensive. I’m sharing incase there are others out there like me, looking for a better way. I’m not going to use this essay to sell it to you — the argument above stands or falls on its own, and you should pressure-test it. It is what I’ve described, but rather than taking my word -look at the zero dependency codebase yourself. Run the proof tests. That’s the only honest way to ship something whose entire pitch is “don’t trust, verify.” Zero black boxes. Where this is going; The agentic wave isn’t slowing — Gartner expects 40% of enterprise apps to embed task-specific agents by the end of 2026. Which means the question isn’t whether we’ll delegate action to AI, we already are. The question is whether we delegate it to systems that act in someone else’s cloud, trust any instruction they’re given, and keep no record we control — or to systems shaped so the human stays at the trigger and the truth stays auditable.Smarter models won’t settle that. Architecture will. And architecture is a choice we’re making right now, mostly by default. Worth making it on purpose, as a sovereign choice. The future of our world may go one way or another but your data and digital life can choose another. The option didn’t exist. Now it does. So I guess I’ll leave this essay and reader with one important final question; How important is sovereignty to you? \ \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook