The Production Gap: Your AI Model Isn’t as Reliable as You Think [Interview]

We sat down with Harsh Verma of Palo Alto Networks to discuss the evolution of AI in security, the specific challenges of securing autonomous agents, and how technical judgment is becoming the critical differentiator for engineers as AI begins to automate more of the coding workflow. Question: You spent your early career as a research assistant in NLP and computer vision, then moved into building large-scale ML systems in production. What's the gap between AI that works in research and AI that holds up in production that engineers consistently underestimate? The most underestimated gap isn’t model performance, it’s system reliability under adversarial, dynamic, and imperfect real-world conditions. “The biggest gap isn’t intelligence, it's survivability in the real world.” My career started in research working across Machine Learning, System Design and Software Engineering in AI where success meant pushing model accuracy on structured datasets. And like most engineers early on, I believed that if a model performed well in evaluation, it was ready . That assumption broke the first time I had to deploy AI into a live, large-scale system. When I transitioned into building production ML systems eventually leading AI-driven initiatives in cybersecurity I saw something very different: models don’t fail loudly in production they fail quietly, and at scale. I remember deploying a high-performing anomaly detection model that looked excellent offline, but in production it started flagging normal behavior as threats while missing coordinated, low-signal attacks entirely. Nothing was technically “wrong” with the model, the issue was that the system needed context over time, not point-in-time predictions. That’s when the shift became clear: in research, you optimize for accuracy; in production, you engineer for uncertainty. The world doesn’t stay still, data drifts, users evolve, and attackers adapt so a production model effectively starts decaying the moment it’s deployed. At the same time, models don’t ship systems where reliability, observability, and fail-safe design matter more than raw performance, especially when the real question becomes not “Is the model right?” but “What happens when it’s wrong?” In adversarial domains, inputs aren’t even benign; they’re intentionally designed to bypass detection, and the most dangerous attacks are the ones that look completely normal. That transition from research to production, and into adversarial environments forced me to move beyond thinking about models and toward designing resilient, intent-aware systems. The Change is the unit of innovation. In research, the unit is the model. In production, the unit is the end-to-end system operating under uncertainty. This is where I’ve focused much of my work building AI systems that are not just accurate, but resilient, adaptive, and secure at scale. Ultimately, the future of AI won’t be defined by smarter models, but by systems we can trust under real-world pressure and that’s the gap most engineers only truly understand after deployment. \n Question: You work at the intersection of machine learning and big data in security. For someone outside that world, what's the genuinely hard technical problem in using ML for threat detection, the part that doesn't reduce to "train a classifier on known attacks"? “The hardest problem in threat detection isn’t classification—it’s understanding intent in a system where everything can look normal.” Working at the intersection of machine learning, big data, and cybersecurity, the real challenge isn’t building models that recognize known attacks; we've largely solved that. The hard part is detecting unknown, low-signal, and coordinated behaviors that don’t resemble anything you’ve seen before. In production systems, especially at scale, most malicious activity doesn’t appear as a clear anomaly. It shows up as a sequence of perfectly valid actions spread across users, services, and time that only become suspicious when you connect the dots. I’ve seen systems where every individual event passed validation, but the overall pattern represented a sophisticated attack. That’s where traditional “train a classifier on labeled data” breaks down, because the problem isn’t static classification it’s dynamic reasoning over behavior. “Attackers don’t break systems, they learn how your system thinks and operates just below its thresholds.” This creates a fundamentally different technical challenge: you’re not just modeling data, you’re modeling adaptive adversaries. Data is highly imbalanced, labels are sparse or delayed, and ground truth is often ambiguous. At the same time, you’re operating under massive scale streaming telemetry, distributed systems, and strict latency constraints where decisions often need to be made in real time. The result is that the hardest problems sit at the intersection of sequence modeling, graph-based reasoning, and system design, not just machine learning. You need to correlate signals across time, detect weak patterns across billions of events, and continuously adapt as attackers evolve. “In security, the absence of evidence is often the strongest signal.” That’s why the field is shifting toward behavioral, intent-driven, and multi-agent systems, where detection isn’t based on what something looks like, but what it’s trying to achieve. In my experience, building these systems requires thinking less like a data scientist optimizing a model, and more like a systems engineer designing for uncertainty, scale, and adversarial pressure. That’s the part that doesn’t show up in most ML discussions but it’s where the real complexity lives. \n Question: Security ML has a property most ML doesn't: an adversary is actively trying to defeat your model. How does that change the way you build, compared to a domain where the data isn't fighting back? “In most ML, you learn from data. In security, you learn against an opponent.” That changes how you build at a fundamental level. When I moved from traditional ML into cybersecurity systems, I saw that models don’t just degrade they get actively probed and bypassed. Attackers don’t break your model; they study it and operate just below its thresholds. “A model that’s predictable is a model that’s vulnerable.” So you stop relying on single classifiers and start building layered, adaptive systems combining models with behavioral signals, sequence analysis, and continuous monitoring. Deployment isn’t the end; it’s where the real learning begins. You also shift from identity to intent. In many attacks I’ve seen, every action looked valid in isolation—the threat only emerged when you connected behavior over time. “In adversarial environments, the goal isn’t perfect accuracy it’s being hard to evade.” That’s the real difference: ML stops being just a modeling problem and becomes a systems problem under continuous attack. Question: Over your decade in the field, what's a technical approach you and the industry believed in early that turned out to be wrong or overrated, and what replaced it? “The thought that anomaly detection would catch attackers. It mostly caught noise.” Early in my career, I relied heavily on anomaly detection : learn “normal,” flag deviations. It looked great in research, but in production I saw two consistent failures: too many false positives from harmless deviations, and missed attacks that stayed within normal patterns. In one system I worked on, the model flagged routine system spikes as threats while a coordinated attack spread across small, valid actions over time passed undetected. “Not all anomalies are threats, and not all threats are anomalous.” What replaced it was a shift to behavioral and sequence-based detection : connecting signals across time, users, and systems, and focusing on intent, not deviation. “We stopped asking ‘Is this unusual?’ and started asking ‘What is this trying to do?’” That shift made systems far more resilient in real-world, adversarial environments. Question: You advise startups on AI strategy. When a founder says they want to "add AI" to their product, what's the question that most reliably reveals whether there's a real problem there or just hype? “What decision gets better or automated because of using AI?” That’s the question I ask every founder. If they can’t point to a specific, repeatable decision that improves with data, it’s usually hype. In my experience advising & mentoring startups founders and building production AI systems : real use cases sound concrete: reduce false positives in alerts, prioritize tickets, detect anomalous behavior across sessions . Weak ones sound like features: add a chatbot, add recommendations . “AI isn’t a feature, it's a decision engine.” The second signal I look for is feedback loops. If there’s no way to measure outcomes and improve the system over time, it’s not an AI problem, it's a static feature with a model attached. “No feedback loop, no real AI, just automation theater.” The founders who get it right are very clear on three things: the decision, the data that drives it, and how the system learns. Everything else is usually noise. Question: "Agent security" is becoming a crowded term. In your framing, what's the actual crisis, and where specifically does the existing security model break when you put autonomous agents into a system? “The crisis isn’t agent security, it's that identity-based security breaks when software starts making decisions.” In traditional systems, we secure who is acting users, services, API keys and assume anything within that boundary is safe. That model held up until we started deploying autonomous agents. In my work on AI systems in cybersecurity, I’ve seen agents with valid credentials and correct permissions still create risk not by breaking rules, but by executing multi-step actions that were individually allowed but collectively harmful. For example, an automated system performing routine actions querying data, modifying configs, triggering workflows can unintentionally create security gaps simply by chaining decisions in ways no human explicitly designed. “Agents don’t break access : they misuse it perfectly.” where it breaks is : Identity ≠ Intent → A trusted agent can still produce unsafe outcomes Static permissions vs dynamic behavior → Policies don’t adapt to evolving goals Single actions vs sequences → Risk emerges across steps, not individual requests “The system says ‘allowed’ at every step but the outcome is still wrong.” what the crisis really is : We built security around who can act. Agents force us to secure what is being attempted over time. “We’re moving from identity-based security to intent-aware, behavior-driven systems.” That’s the shift and most current systems aren’t built for it. Question: There's a growing argument that the real exposure with AI agents isn't the model, it's identity: agents acting with credentials and access designed for humans. Do you buy that framing, and if so, what does a security team have to do differently starting now? “I will buy it. The real risk isn’t the model, it’s what the model is allowed to do.” In systems I’ve built, I’ve seen agents with perfectly valid, human-level credentials create risk without any compromise. In one case, an automated workflow had access to query data, update configs, and trigger downstream jobs each step was individually approved. But chained together, it created an unintended exposure. The system said “allowed” at every step the outcome was still wrong. “Agents don’t break access, they use it exactly as designed, just faster and at scale.” What needs to change is Scoped, short-lived access → agents shouldn’t inherit broad human roles, Continuous authorization → re-evaluate every step, not just at login, Sequence-level security → monitor actions over time, not in isolation and Intent-aware controls → focus on what the agent is trying to achieve “We built security for users making requests. Now we need security for agents executing goals.” Question: If you were advising a company that's already shipped autonomous agents into production without an identity strategy, what's the first thing you'd tell them to fix, and what's the failure mode you'd worry about most? “First fix: remove broad, human-level access from agents immediately.” In systems I’ve worked on, the biggest issues didn’t come from compromised models, but from agents operating with over-permissioned credentials. In one case, an agent inherited a service role that allowed it to read data, modify configurations, and trigger downstream workflows. It began as a simple automation, but over time it chained actions pulling sensitive data for context, adjusting configs to optimize execution, and triggering dependent systems. Every step was authorized, yet the outcome was unintended data exposure and system drift. In another instance, an agent repeatedly retried a failing operation and, with valid access, kept triggering downstream jobs eventually causing resource exhaustion and operational instability. Nothing was flagged because each action was technically allowed. “Nothing was compromised, just too much access used exactly as designed.” The first thing I tell teams is to move to task-scoped, short-lived credentials : agents should only have access for the next action, not the entire workflow. The failure mode I worry about most isn’t a visible breach; it's a silent, compounding cascade, where individually safe actions accumulate into system-level risk. “The most dangerous failures aren’t unauthorized, they're fully authorized and still wrong.” Question: Where do you think the industry consensus on AI security is currently wrong, or dangerously incomplete? What's a risk people underrate because it's harder to demo than a flashy prompt-injection attack? “The industry is over-focused on how inputs can break models and under-focused on how outputs can safely operate systems.” Right now, a lot of attention is on prompt injection and input manipulation because they’re easy to demo. But in the systems I’ve worked on, the bigger risk isn’t a clever prompt, it's what happens after the model produces an answer and starts acting on it. I’ve seen production workflows where an agent generates a response that looks reasonable, and that response directly triggers actions querying internal data, modifying configs, or initiating downstream processes. No prompt injection, no obvious exploit just a plausible but slightly wrong output being executed with real permissions. “The most dangerous failures don’t look malicious, they look correct.” This creates a blind spot: we secure inputs, but we don’t sufficiently validate outputs before they become actions. Another underrated risk is cross-step drift. Individually, each step in an agent workflow is valid, but over time the system drifts from the original intent. I’ve seen agents start with a benign goal and, through intermediate reasoning steps, take actions that no human explicitly approved—because each step locally made sense. “Nothing is wrong at any step until you look at the outcome.” What is feel is incomplete is Too much focus on prompt-level attacks, not enough on execution-level risk. Treating outputs as answers, not as actions with consequences. Evaluating safety at a single step, instead of across multi-step workflows. The risk is “Execution without verification.” If a model’s output can trigger real-world actions, then every output is effectively untrusted code and most systems aren’t treating it that way yet. “The real risk isn’t that the model says something wrong, it's that the system does something wrong because of it.” That’s harder to demo than a prompt injection but far more dangerous in production. Question: Without getting into anything specific to your work, describe a moment, yours or one you watched, where the gap between "the agent works" and "the agent is safe to trust with access" became real. What did it teach you? “The moment an agent works is not the moment it’s safe it’s the moment the real risk begins.” I remember a system where an agent was automating a multi-step workflow pulling data, making a decision, and triggering downstream actions. In testing, it worked beautifully: accurate outputs, clean execution, clear efficiency gains. So we gave it broader access. The issue only showed up later. The agent encountered a slightly ambiguous scenario and made a reasonable but wrong decision. Then it acted on it. It pulled the wrong dataset, triggered a follow-up process, and propagated that error across systems. No alert fired, because every step was valid. The system did exactly what it was designed to do. “Nothing broke. That was the main problem.” What made it real for me was that the gap wasn’t about model accuracy it was about unverified execution. We had validated that the agent could work, but not that it would always act safely under uncertainty. What i learnt is “Working is about correctness. Trust is about control.” It changed how I think about deploying AI systems: Outputs are not answers they’re actions that need validation Access shouldn’t be granted based on capability alone Safety has to be enforced across the entire sequence, not just the decision point The major learning “An agent becomes dangerous the moment you stop treating its output as a suggestion and start treating it as an instruction.” That’s the gap and most teams only see it after something goes wrong quietly. Question: Your thesis is that as AI generates more of the code, the differentiator moves to judgment, systems thinking, and the ability to frame the right problem. Make that concrete: what does a senior engineer do in a day that AI can't, and won't soon? In my day-to-day life, the work that still matters isn’t writing functions, it's making decisions under uncertainty. A senior engineer spends time framing the problem: Are we solving the right thing? What are the failure modes? What happens when this runs at scale or under adversarial conditions? AI can suggest implementations, but it doesn’t own the consequences. I’ve seen this clearly while building production AI systems. The hardest parts weren’t the models or the code; they were decisions like: should this system act autonomously or require a human checkpoint? What access should it have? What’s the blast radius if it’s wrong? Those are judgment calls, not coding tasks. “Junior engineers write code. Senior engineers define the boundaries within which code is allowed to operate.” A typical day involves: Defining system boundaries → what the AI is allowed to do vs not do Anticipating failure modes → especially the silent, compounding ones Connecting systems → understanding how one decision propagates across services Balancing trade-offs → speed vs safety, automation vs control For example, I’ve worked on systems where the technically correct solution was to automate a workflow but the right decision was to keep a human in the loop because the cost of a rare failure was too high. That’s not something AI optimizes for it optimizes for patterns, not consequences. “The real skill isn’t building the system, it's knowing where it shouldn’t be trusted.” I feel what wont get automated soon will be : AI can generate code, but it won’t replace: Contextual judgment across systems, Accountability for decisions and outcomes and Designing for unknown unknowns. “As AI writes more code, engineering becomes less about implementation and more about responsibility.” And that shift from writing code to owning outcomes is what makes engineers durable. Question: If AI handles more of the coding, the worry for engineering leaders is the pipeline: junior engineers normally learn the craft by writing the code that's now being automated. What does developing real technical judgment look like when the entry-level work is the first thing to go? “If juniors write less code, they need to think more about what the code actually does.” The pipeline isn’t broken, it's shifting. Early in my career, I learned by writing and debugging code, but the real growth came when things failed in production and I had to understand why . Today, that learning can happen faster. I’ve seen junior engineers grow quickly when they review AI-generated code and catch issues like hidden assumptions or edge cases the model missed. In one case, a junior flagged that an automated workflow would work in testing but fail under scale due to retry loops, something the code itself didn’t reveal. That’s judgment. The focus needs to move from writing small tickets to owning outcomes, debugging incidents, analyzing failures, and making trade-offs like whether to automate a workflow or add guardrails. “You don’t build judgment by writing more code, you build it by understanding where code breaks.” The engineers who become strong now aren’t the ones typing the most, they're the ones who question, connect systems, and anticipate failure before it happens. Question: There's a counterargument to "engineering beyond the code", that it's an excuse to drift away from technical depth into vague strategy work, and that the engineers who stay closest to the system still win. Where do you think that critique is right? “That critique is right the moment ‘beyond the code’ means losing touch with the system, you’re no longer engineering.” I’ve seen this go wrong when high-level designs looked clean but failed under real conditions like a workflow that seemed efficient on paper but caused retry storms and cascading failures once it hit scale. The issue wasn’t strategy, it was lack of proximity to the system. The best engineers I’ve worked with don’t write every line of code, but they can jump in, debug, and understand exactly where things break. “You can’t design systems you can’t debug.” The balance is staying close enough to the code to understand constraints and failure modes, while thinking beyond it for system design and trade-offs. Engineers who drift too far into abstraction lose depth; those who stay grounded in the system build judgment that actually holds up in production Question: You're writing a book on this. In one sentence, what's the central claim you'd be willing to be proven wrong about in five years? “Even as AI writes most of the code, the engineers who win will still be the ones with the deepest system-level judgment, not the ones closest to the keyboard. They need to be super engineers to use multiple AI tools and play role of multiple domain to solve real world problems” Question: For a strong engineer who buys the argument and wants to act on it, what's the one thing worth doing in the next year that compounds, and what's the thing everyone does that's wasted effort? Spend a year owning a system end-to-end in production that compounds more than anything. The highest ROI move is to take responsibility for a real system, something that runs, breaks, scales, and impacts users and follows it through failures, incidents, and iterations. That’s where judgment forms. I’ve seen engineers grow fastest when they debug a production issue at 2am, trace it across services, and redesign the system so it doesn’t happen again. That experience compounds because it builds intuition about failure, trade-offs, and real-world constraints things AI won’t teach you. “You don’t build judgment by building features, you build it by owning outcomes.” The wasted effort is chasing volume, writing more code, more side projects, more tutorials without exposure to real constraints. I’ve seen engineers ship dozens of clean projects that never face scale, ambiguity, or failure, and they plateau quickly. If nothing breaks, you’re probably not learning the right things. One real system you own deeply beats ten you only touch superficially. That’s the compounding move and way to go. \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook