AI’s usefulness depends on whether we can trust it to act alone

The views expressed by contributors are their own and not the view of The Hill

A major shift in the approach to AI governance is underway at the federal level, though the matter is far from settled. In early June, the White House issued an executive order on AI innovation and security and National Security Presidential Memorandum-11 on AI in the national security enterprise, as a new generation of models gained powerful cyber capabilities.

Before the order’s own review framework was operational, the administration invoked export-control authority to restrict Anthropic’s Fable 5 and the Mythos model behind it. OpenAI, meanwhile, limited the release of GPT-5.6 pending government sign-off. This upheaval is unfolding even as agencies and companies race to integrate AI agents into everyday workflows. But far beyond the national security headlines lie questions of secure deployment and whether these systems can be trusted to act.

Agentic AI is ultimately about delegation. A system that can draft an email, search a database, file a form, write code, monitor a network, or route a request is no longer merely producing information. It is being trusted to act, often across multiple steps before a human reviews the result. And these systems are improving fast.

The Model Evaluation and Threat Research organization, which evaluates frontier AI systems, tracks progress on how long a task would take a human expert, and whether AI can complete it reliably. In 2025, that threshold was doubling every seven months. More recent estimates suggest it’s closer to four months. The trajectory is clear. Now, institutions must build the capacity to govern agents while meaningful human control is still possible.

Properly deployed, agents could transform the relationship between citizens and government. A small business navigating licensing could spend more time serving customers and less on paperwork. A veteran filing a benefits claim could shave weeks off a process that today takes months. Agencies could use agents to remove unnecessary steps, reduce backlogs, and deliver higher-quality services.

But these possibilities depend on trust, reliability, and security. A poorly governed agent could move information to the wrong place, act outside its authority, or bury an error in a chain of automated steps no one can later reconstruct. The results? Misallocation of benefits, crippling of critical infrastructure, even conflict escalation. AI decision-support systems are already generating target recommendations for military commanders and being incorporated into sensitive systems, while guidance and technical standards lag behind.

Much of the AI policy debate focuses on access: who gets the models, chips, data, and energy. As AI systems begin to act, however, policymakers also face the urgent question of what makes delegation to these systems trustworthy: whether institutions can use them reliably, securely, accountably, and in ways understood by government and the public.

To build the infrastructure that guides responsible use, we need trained personnel, sound procurement, clear lines of authority, audit logs, and the ability to reconstruct decisions after the fact. That deployment infrastructure will determine whether agentic AI strengthens public institutions or makes them more brittle.

Cybersecurity makes the stakes concrete. Anthropic’s Mythos model, which has sharply outperformed prior systems at finding software vulnerabilities, shows how quickly agentic capabilities can serve both defenders and attackers. Industry has responded: Anthropic’s Project Glasswing and OpenAI’s Daybreak programs extend access to advanced tools to vetted defenders, an approach known as differential access.

As noted above, the White House is focusing on access. This alone, however, will not help the hospitals, utilities, state agencies, and municipal systems most exposed to cyber threats if they lack the staff, standards, integrations, and operational practices to use those tools responsibly. The Cybersecurity and Infrastructure Security Agency and the National Security Agency, along with Australian and allied cyber agencies, recently issued careful adoption of agentic AI services guidance, highlighting permissions, segmentation, monitoring, accountability, and human oversight.

Two priorities stand out. First, policymakers should invest in evaluation and auditing capacity. For the government to rely on agentic systems, it needs to understand how those systems use tools, whether they stay within authorized boundaries, how they behave under stress, performance in cyber-relevant tasks, and what risks appear with multi-agent interactions.

The June executive order operationalized a version of this — asking developers to submit their most capable models for government review before broad release — but how it works in practice is still developing. Even so, evaluation science must keep pace with the frontier — models that know they are being evaluated, for example, complicate measurement and demand new tests.

The Center for AI Standards and Innovation is responsible for much of this work, including the evaluation capacity any such review depends on. It has agreements to support pre-deployment evaluations and targeted research on frontier AI capabilities, including collaborations with OpenAI and Anthropic. That is the right direction, but the center lacks Congressional authorization, and its $10 million budget is too small. Its deep evaluation expertise should complement NSA’s in national security deployments.

Second, policymakers should clarify and strengthen export controls and enforcement measures that protect U.S. and allied advantages in frontier AI. The question is not only who has the most capable models, but whose institutions and values shape how agentic systems are built, used, and relied upon. Recent bipartisan bills from the House Foreign Affairs Committee, including the Chip Security and Stop Stealing Our Chips Acts, are part of a serious strategy to maintain America’s lead. That lead will matter most if it is matched by a lead in evaluations, secure deployment, and institutional integrity.

Today’s agents are still manageable. That is the point. The deployments happening now are building, or failing to build, the institutional habits the country will need as agentic systems become more capable. The next phase of AI policy should focus not only on who gets powerful AI, but on building the governance frameworks necessary for reliable, accountable, and secure use.

Jenny Marron is Executive Director of the Institute for AI Policy and Strategy. She previously served at the White House National Security Council and in the U.S. Department of State.