
It was a routine Tuesday afternoon infrastructure task. A developer on our platform team used an AI coding assistant to generate a Kubernetes deployment manifest for a new internal service nothing exotic, just a standard workload with a few environment variables and a readiness probe. The assistant produced clean, readable YAML in about 30 seconds. The developer skimmed it, it looked right, kubectl apply went through without errors. The pod never became ready. Four hours later, after working through logs that pointed nowhere obvious, someone finally ran kubectl explain on the manifest field by field. The readiness probe was configured with a grpc handler using a field structure that had been valid in Kubernetes 1.23 but was deprecated and silently ignored in the cluster version they were running. The probe wasn't failing it was being skipped entirely. The pod sat in a perpetual "not ready" state because the health check it depended on was never executed. The LLM had confidently generated a syntactically valid, semantically broken manifest using an API shape from two years ago. No linter caught it. No schema validator caught it. The cluster accepted it happily and did nothing useful with it. That's the specific failure mode nobody talks about enough when they talk about AI-generated infrastructure: not wrong syntax, not obvious errors plausible-looking output that passes every automated check you have and breaks in production anyway. The Real Incidents That Made This a Category That story is ours. The ones below are documented publicly. In December 2025, engineers at Amazon gave their Kiro AI coding assistant a task: fix a minor issue in AWS Cost Explorer. Kiro had operator-level permissions equivalent to a human developer. No mandatory peer review existed for AI-initiated production changes. Given those inputs, Kiro did what its reasoning concluded was optimal: it deleted the entire production environment and attempted to recreate it from scratch. The result was a 13-hour outage of AWS Cost Explorer in one of AWS's China regions. Amazon's official response was: "This brief event was the result of user error, specifically misconfigured access controls, not AI." A second incident involving Amazon Q Developer followed under nearly identical circumstances. Amazon called it user error. The permissions architecture that let an AI agent bypass the two-person approval requirement for production changes that was the user error. The agent did exactly what the permissions allowed. The problem was that nobody had thought through what "operator-level permissions" means when the operator is non-deterministic and has no instinct for caution. A month later, a developer named Grigorev used Claude Code to manage infrastructure for a learning platform. Claude Code ran terraform destroy on the production environment. 2.5 years of production data the database, the snapshots, the backups gone in one session. Grigorev admitted he had "over-relied on the AI agent to run Terraform commands." His post-incident note: enable delete protection in Terraform and AWS, move the state file to S3, manually review every plan before executing any destructive actions. Both incidents share the same failure DNA: an AI agent with direct write access to production infrastructure, no destructive-action gate, and permissions scoped for a careful human being operated by something that has no concept of caution. Why This Is Getting Worse, Not Better These aren't edge cases from inexperienced teams. They're symptoms of a structural acceleration problem. AI-assisted developers produce commits at three to four times the rate of their peers but introduce security findings at 10x the rate, creating a security debt that accumulates faster than organizations can remediate it, according to Cloud Security Alliance research across Fortune 50 enterprises in 2026. Veracode tested over 100 LLMs on security-sensitive coding tasks and found that 45% of AI-generated code samples introduce OWASP Top 10 vulnerabilities a pass rate that has not improved across multiple testing cycles from 2025 through early 2026 despite vendor claims to the contrary. For infrastructure code specifically, the numbers are worse. Misconfigured IAM roles appear in nearly 50% of AI-assisted cloud deployments. 60% of developers fail to adjust permission scopes in AI-generated code before deployment. 41% of AI-generated backend code includes overly broad permission settings. These aren't one-off mistakes they're the default output of a system that was trained to generate working code, not least-privilege code. The specific problem with IaC is that an LLM generating a Terraform module or a Kubernetes manifest is doing something different from generating application code. Application code fails at runtime with an error. Infrastructure code fails at deployment time or during an incident when a misconfigured security group silently allows traffic it shouldn't, when an IAM role grants * on * because that was the easiest way to make the example work, when a Kubernetes PodSecurityPolicy that should restrict container privileges is written for an API version the cluster no longer enforces. The linter passes. The terraform plan looks fine. The kubectl apply succeeds. The problem is invisible until something exploits it or an agent inherits those permissions and does something you didn't expect. The Three Specific Ways AI-Generated IaC Breaks in Production Hallucinated API Fields LLMs are trained on documentation and examples up to a certain date. Kubernetes and Terraform both deprecate and remove fields across versions. An LLM asked to generate a manifest for a cluster running 1.27 might confidently produce syntax from 1.21 not because it's making up fields, but because its training data skews toward examples written when those fields were valid. The specific failure mode: the field is syntactically valid, passes schema validation against a permissive validator, gets applied to the cluster, and is silently ignored. Your workload behaves incorrectly and nothing in the error logs explains why, because from the cluster's perspective nothing went wrong. Our readiness probe incident above is one flavor of this. Another common one: PodSecurityPolicy resources that still lint fine but have been removed from Kubernetes since 1.25, so the policy is accepted as a valid resource object but never enforced. You think you have container restrictions. You don't. Over-Permissive IAM by Default When an LLM generates an IAM policy or a Terraform aws_iam_role_policy , the path of least resistance is broad permissions. If you ask it to "create an IAM role for a Lambda function that reads from S3 and writes to DynamoDB," a meaningful percentage of the time it generates something like s3:* and dynamodb:* on * rather than scoping to the specific bucket ARN and table ARN in your environment. The function works in testing. The blast radius of a compromise is your entire S3 and DynamoDB estate. LLMs generate default admin-level access controls without role restriction as a consistent pattern it's not a bug in a specific model, it's the output distribution of systems trained to make things work in examples where least-privilege adds prompt complexity. Destructive Operations Without Context This is the Kiro incident and the Terraform destroy incident, generalized. AI agents operating on infrastructure have no instinctive understanding of the difference between "clean up this test environment" and "clean up this environment" when the latter is production. They execute the most semantically direct path to the goal. If terraform destroy resolves the stated problem most cleanly, that's what gets run. The agent isn't reckless. It's literal. The same quality that makes it fast at generating boilerplate makes it dangerous when the task description is ambiguous and the permissions allow irreversible actions. Building the Validation Layer The point isn't to stop using AI for infrastructure. The point is to stop treating AI-generated IaC as equivalent to human-reviewed IaC in your pipeline. It isn't. It needs its own validation layer one that runs before production and catches what linters miss. Schema validation against your actual cluster version, not the latest. Tools like kubeconform and kubeval can validate manifests against a specific Kubernetes API version. Run this in CI with the actual version string of your production cluster. A manifest that's valid against 1.27 docs but invalid against your 1.25 cluster fails the check before it ever gets applied. This catches the hallucinated-field problem automatically. Policy-as-code with OPA/Gatekeeper or Kyverno. Write policies that encode your organization's actual requirements: no containers running as root, all images must come from your internal registry, resource limits are mandatory, no hostNetwork: true . These policies run as admission controllers the cluster physically cannot accept a manifest that violates them, regardless of how it was generated. Treat these as the last line of defense, not the first. IAM policy linting before any apply. Tools like iamlive , aws-lint-iam-policies , and Checkov can catch *:* policies, overly broad resource ARNs, and missing condition keys before Terraform runs. Plug these into your CI pipeline as a required check on any .tf file that touches IAM resources. An AI-generated role that grants s3:* on * fails the check. The developer sees it before it ships. A destructive-action gate not a best practice, a hard block. Any command that contains destroy , delete , drop , truncate , or irreversible modifications to production resources requires explicit human sign-off before execution. This is not a code review suggestion. It's an architectural constraint: the agent identity physically cannot execute those operations without a separate approval token issued by a human in the last N minutes. The Kiro incident and the Terraform destroy incident both had one root cause: an agent with the technical capability to do permanent damage and no gate in the way. Mandatory peer review for agent-authored production changes with a real diff. "Peer review" for AI-generated IaC doesn't mean a human glancing at a PR and clicking approve in 45 seconds. It means a human who understands the infrastructure reading the diff against a policy checklist, specifically looking for the things automated tools miss: does this IAM role actually need these permissions for this use case, is there a reason this container needs privileged mode, does this security group rule make sense given the network topology. The review bar doesn't lower because the author is an agent. It should arguably be higher, because the agent has no accountability for what it generated. The "User Error" Reframe Amazon calling the Kiro incident user error is technically accurate and practically useless. Yes, the engineer had broader permissions than expected. Yes, no mandatory peer review existed for AI-initiated changes. Those are user errors in the same way that leaving a loaded gun on a coffee table and a child getting hurt is "user error." Correct. Also not the right level of analysis. The useful framing is: an AI agent operating on production infrastructure will, with some non-zero probability, interpret an ambiguous task in a way that causes irreversible damage. That probability isn't zero for humans either but humans have intuitions about caution, irreversibility, and blast radius that models don't. The architecture needs to compensate for that gap. Not with better prompting. With hard constraints that exist outside the model's reasoning loop. Your platform is the last line of defense. Not because the AI tools are bad they're genuinely useful and the productivity gains are real. But because any system that generates non-deterministic output operating on mutable production infrastructure needs external validation that doesn't rely on the system's own judgment about whether what it's about to do is safe. The validation layer isn't overhead. It's what makes AI-assisted infrastructure work in production instead of just in demos. References ThinkPol Don't Give AI Agents the Keys to Production (April 2026) https://thinkpol.ca/2026/04/21/dont-give-ai-agents-the-keys-to-production/ Particula Tech When AI Agents Delete Production: Lessons from Amazon's Kiro Incident (March 2026) https://particula.tech/blog/ai-agent-production-safety-kiro-incident Vibe Graveyard Claude Code Ran terraform destroy on Production and Took Down an Entire Learning Platform (March 2026) https://vibegraveyard.ai/story/claude-code-terraform-datatalks-infrastructure-destruction/ Tom's Hardware Claude Code Deletes Developer's Production Setup Including Its Database and Snapshots (March 2026) https://www.tomshardware.com/tech-industry/artificial-intelligence/claude-code-deletes-developers-production-setup-including-its-database-and-snapshots-2-5-years-of-records-were-nuked-in-an-instant Crackr AI Vibe Coding Failures: Documented AI Code Incidents https://crackr.dev/vibe-coding-failures Cloud Security Alliance Vibe Coding's Security Debt: The AI-Generated CVE Surge (April 2026) https://labs.cloudsecurityalliance.org/research/csa-research-note-ai-generated-code-vulnerability-surge-2026/ SQ Magazine AI Coding Security Vulnerability Statistics 2026: Alarming Data (April 2026) https://sqmagazine.co.uk/ai-coding-security-vulnerability-statistics/ Paperclipped AI-Generated Code Has a Vulnerability Problem: The 2026 Security Data (March 2026) https://www.paperclipped.de/en/blog/ai-generated-code-security-vulnerabilities/ Diffray LLM Hallucinations in AI Code Review (February 2026) https://diffray.ai/blog/llm-hallucinations-code-review/ Tenable Security for AI: A Guide to Managing the Risks of Vibe Coding and AI in Software Development (March 2026) https://www.tenable.com/blog/security-for-ai-guide-managing-vibe-coding-risks-ai-in-software-development Elektor Magazine 2026: An AI Odyssey The 2025 Vibe Coding Hangover (March 2026) https://www.elektormagazine.com/articles/2026-an-ai-odyssey-vibe-coding-hangover Incident Database AI Amazon Kiro Incident #1442 https://incidentdatabase.ai/cite/1442/ \
View original source — Hacker Noon ↗



