
TL;DR
Anthropic reveals that Claude now writes over 80% of its production code, with engineers shipping 8x more code per quarter than in 2024. The company’s new Anthropic Institute paper maps the path to recursive self-improvement and calls for a verifiable global pause mechanism.
One of Anthropic’s engineers hasn’t written a line of code in five months. Not because the work dried up, but because Claude does it now. As of May 2026, more than 80% of the code merged into Anthropic’s production codebase was authored by Claude, up from low single digits when Claude Code launched in February 2025.
That figure, published Wednesday in a new Anthropic Institute paper titled “When AI builds itself,” is not the headline the company wants you to focus on. The headline is what comes next: AI that can design and train its own successor. Anthropic says it isn’t there yet, but it might be closer than most institutions are prepared for.
The numbers behind the shift
The productivity gains are stark. In Q2 2026, the typical Anthropic engineer merged eight times as much code per day as in 2024. An internal poll of 130 research staff found that the median respondent estimated roughly four times as much output with Anthropic’s latest model, Mythos Preview, compared to working without AI.
On the most complex, open-ended engineering problems, Claude’s success rate climbed to 76% in May 2026, a 50-percentage-point increase in six months. Anthropic gives a concrete example: when a routine upgrade began crashing tens of thousands of training jobs, an engineer pointed Claude at the live incident with little more than some text context and cluster access. Claude isolated an obscure debugging flag, reproduced the crash, and confirmed a fix in about two hours. That would normally take two to three days.
The code quality gap is closing, too. Anthropic staff say that Claude-written code was “somewhat worse” than human-written code in late 2025, is at rough parity today, and is expected to be strictly better within the year. An automated Claude reviewer now checks every proposed change to Anthropic’s codebase before it can merge. A retrospective analysis found it would have caught roughly a third of the bugs behind past claude.ai incidents before they reached production.
From coding to research
Writing code is the easy part. The harder question is whether Claude can do research, the kind of open-ended scientific reasoning that drives AI forward.
Anthropic’s evidence here is more preliminary but still striking. In April 2026, the company published a demonstration of Claude running an open-ended AI safety research project end to end. Nine parallel agents were given a problem, left to propose hypotheses, run experiments, share findings through a common forum, and iterate. Over 800 cumulative hours and roughly $18,000 in compute, the agents recovered 97% of the performance gap on the task. Two human researchers, working for a week, recovered 23%.
Another internal experiment measured whether Claude could pick a better “next step” than a human researcher at difficult junctures during real research sessions. In November 2025, Claude matched the human’s judgment 51% of the time. By April 2026, that rose to 64%. The day-to-day work of research is largely a chain of these next-step decisions. If that trend continues, the gap between AI-as-assistant and AI-as-researcher narrows fast.
The task horizon curve
Anthropic’s internal data aligns with a broader pattern tracked by METR, a non-profit that benchmarks AI capabilities. The length of tasks AI can reliably complete on its own has been doubling roughly every four months, accelerating from an earlier pace of every seven months.
In March 2024, Claude Opus 3 could handle tasks that take a human about four minutes. By early 2025, Claude Sonnet 3.7 managed hour-and-a-half tasks. Today, Claude Opus 4.6 handles 12-hour tasks, and METR found that Mythos Preview could sustain work for at least 16 hours, at the upper end of what the current benchmark suite can measure. If the trend holds, tasks requiring days of skilled human work come into range this year. Weeks-long tasks could follow in 2027.
The infrastructure is buckling
The downstream effects are already visible. GitHub, the platform most of the world’s software is built on, saw roughly one billion code commits in all of 2025. By mid-2026, the platform was processing 275 million commits per week, on pace for 14 billion over the year. Claude Code alone accounts for 4.5% of all public commits on GitHub, generating 2.6 million weekly.
GitHub’s COO has said the company is “pushing incredibly hard” on capacity just to keep up. Inside Anthropic, the bottleneck has already shifted: as Claude generates more code, human code review has become the constraint. The company says it has encountered a textbook example of Amdahl’s law, where speeding up one part of a process simply reveals the next slowest link.
The pause question
The paper’s most significant section is not about productivity. It is a call for a verifiable global mechanism to slow or temporarily pause frontier AI development.
Anthropic is careful with the framing. A unilateral pause by one lab would simply change who leads, not create the deliberative process the company says is missing. What Anthropic proposes instead is a system where multiple frontier labs, in multiple countries, could agree to stop under the same conditions and verify that the others had actually done so. It draws a parallel to nuclear arms control but acknowledges the differences: training runs are far easier to conceal than missile silos, the inputs are general-purpose, and the incentive to defect quietly is enormous.
“If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing,” the paper states. The AI coding market is now worth tens of billions. Asking the industry to pause is asking it to leave money on the table while trusting that competitors, including those in China, will do the same.
What recursive self-improvement would mean
The paper lays out three possible futures. In the first, the trend stalls, but even today’s capabilities reshape the economy. In the second, AI development becomes substantially automated while humans still set research direction, meaning 100-person companies could do the work of 100,000-person organisations. In the third, AI systems achieve full recursive self-improvement and begin designing their own successors.
Anthropic says it does not have “good intuitions” for what that third scenario looks like. But it offers one observation: even recursive intelligence cannot speed up everything. It cannot learn what a drug does over decades of use, hold elections sooner than a constitution dictates, or turn a stranger into an old friend in a weekend. The felt pace of this future, for most people, would still be set by the bottlenecks.
The company’s growing enterprise push makes the timing of this paper notable. Anthropic is simultaneously selling Claude as a productivity revolution and warning that the trajectory it enables could require a global emergency brake. Whether that tension is principled transparency or strategic positioning depends on what happens next.
View original source — The Next Web ↗


