
A Miami startup says it has cracked a maths problem that has made AI models slow and power-hungry for almost a decade. The claim was bold enough to draw comparisons with Theranos. Now, though, the company has independent test results that back much of it up.
The startup is called Subquadratic. It came out of stealth in May with $29mn in seed funding and a new language model named SubQ. According to the company, SubQ is faster, cheaper, and far less energy-hungry than today’s leading models. It can also read up to 12 times as much text at once.
The decade-old bottleneck
To see why that matters, it helps to know how most large language models work. At their core sits a “transformer”, introduced by Google researchers in 2017. The transformer runs a process called dense attention.
Dense attention is thorough, but it is expensive. It compares every word in a text with every other word. So when you double the length of the text, the work roughly quadruples. That “quadratic” scaling is the main reason LLMs guzzle so much compute and power.
Subquadratic’s fix
The 💜 of EU tech
The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!
Subquadratic’s answer is to drop dense attention for “sparse attention”. Instead of comparing every word with every other, sparse attention keeps only the pairs that matter. The idea is old, and plenty of teams have tried it. Until now, however, none had matched dense attention’s quality.
The company says its version finally does. Crucially, it picks which words to focus on dynamically, based on the content rather than a fixed pattern. “That’s kind of where the secret sauce is,” says co-founder and chief technology officer Alex Whedon.
At first, the claims rested on a handful of self-published scores. Naturally, the reaction was sceptical. One AI engineer summed it up on X: SubQ is “either the biggest breakthrough since the Transformer … or it’s AI Theranos”.
So the company brought in a third party. It asked Appen, a firm that evaluates other companies’ models, to run the tests. The results were striking. On a raw speed test, SubQ ran 56 times faster than FlashAttention, a leading existing method. On a tough coding benchmark, it scored 89.7 per cent, close to the best models around.
The cost gap looks just as wide. By the startup’s account, running one long-context test on Anthropic’s top model costs about $2,600. On SubQ, it says, the same test cost eight dollars.
Still too good to be true?
Even so, there are reasons for caution. Benchmarks are not the same as real-world use. SubQ is also not widely available yet. Tens of thousands have joined the waitlist, but only a handful have access.
There is a wrinkle in the origin story, too. Rather than train SubQ from scratch, Subquadratic started from an existing open-weight model and swapped in its new attention method. That is common practice. However, it sits awkwardly next to the claim of fully reinventing how LLMs work.
“They may have built something real and useful,” says Will Depue, an independent researcher who used to work at OpenAI. “But the public evidence does not yet justify the stronger claim that they have solved the quadratic attention bottleneck.”
Why it matters
If the results hold, the payoff is large. Cheaper, faster long-context models could read entire codebases, contract sets, or document troves in one pass. They would also cut the cost and energy of running AI.
That prize is one the whole industry is chasing. AI already strains against the spiralling economics of AI agents, and other startups, such as Thomas Reardon’s Flourish, are attacking efficiency from other angles. Subquadratic, though, is betting the whole field will follow it. “We don’t think anybody will be building on transformers in a few years,” says chief executive Justin Dangel.
View original source — The Next Web ↗


