
An LLM agent optimizes for plausible, not correct. It will hand you confident, well-formatted code that does the wrong thing — and tell you it works. Prompts, system messages, and house rules shape behavior. They don't verify anything. The only thing an agent can't argue past is an executable oracle: a test, a type, a contract, a property check. The new unit of engineering work is the spec that judges the code, not the code itself. The test suite is a prompt that runs forever. The catch: agents will game weak oracles. If your verification can be satisfied by deleting a test or weakening an assertion, it isn't an oracle — it's a suggestion. Generation is now free. Verification is the moat. Spend your scarce human attention there.
View original source — Hacker Noon ↗



