And why that’s good
For the last five years, AI has moved so blazingly fast that every new model felt like a plot twist. GPT-3, GPT-4, Claude, Gemini — each one arrived with bigger numbers, bigger claims, and bigger expectations. It was like the JavaScript arena 10 years ago, with a new JS framework being released every couple of weeks.
Alas, technological progress never grows in a straight line. It grows in S-curves: explosive acceleration, followed by a long flattening. LLMs — specifically, dense, transformer-based, text-trained models — are now entering that flatline stage of progress.
Not because AI is slowing down — far from it. The current slump most likely comes from architectural limits for this AI generation, not the field itself. Let's continue with the JS parallel approximation. In JS, performance breakthroughs didn’t come from changing JS; they came in the form of new runtimes. Blazingly fast runtimes:
- Bun implementing parts of the runtime in Zig
- Vite unlocked speed by rewriting its core in Rust
- and TypeScript got 10x performance gains by moving to Go
I would venture to say the future AI gains will come not in the form of making transformers bigger, but from new architectures that redefine how models reason, plan, and execute.
The real meaning of “plateau”
People imagine a plateau as a wall, but the science paints a different picture. Scaling-law research shows that model improvement follows “a smooth power law that approaches an irreducible minimum” (Epoch AI).
Translation:
- Early scaling produces huge jumps.
- Later scaling produces tiny gains.
- Each improvement costs dramatically more compute, data, and energy.
⠀This is the plateau: diminishing returns long before we hit a true ceiling. And we are now inside that phase.
Data exhaustion: the fuel source is finite
Every LLM is trained on human-generated text — books, articles, code, conversations. But we are running out.
As MixFlow explains: “the stock of high-quality human-generated text data is finite and projected to be exhausted in the coming years.” Without fresh, high-quality text, models hit a natural boundary. You can pour more compute into training, but if the dataset is tapped out, you stop getting breakthrough leaps.
Architectural limits: transformers can predict, but not understand
Transformers are incredible pattern recognizers — but they do not possess grounding, causality, or real reasoning.
This is why hallucinations persist at every scale.
Researchers describe this stage as the plateau of “non-reasoning LLMs,” noting that “progress on text-only tasks is flattening as models converge toward GPT-4-level performance on many public benchmarks” (Leena AI).
You can scale prediction. You cannot brute-force understanding.
Compute, cost, and the AI brick wall
The next leap in LLM performance would require:
- Vastly larger clusters of GPUs — or TPUs
- Tons of electricity
- More expensive training runs
- Massive operational budgets
SemiAnalysis calls this the AI brick wall — the point at which scaling dense transformers becomes economically irrational.
This isn’t a theoretical problem. It’s a financial one.
Are we already seeing the slowdown?
Yes — and the evidence is everywhere.
Benchmark improvements between model generations have shrunk into single-digit percentages. Analysts describe this as “convergence toward a performance ceiling” as improvements on MMLU, GSM8K, and HumanEval flatten out (Gary Marcus).
In plain English: GPT-4-level performance is becoming the default across the industry.
The gaps between models are now refinements, not revolutions.
Why this is not the end of AI
A plateau in transformers does not mean a plateau in AI. Scaling-law updates like Chinchilla show we still have headroom if data and compute become more efficient (LifeArchitect).
But more importantly, the breakthroughs ahead will come from new capabilities, not bigger text models:
- Multimodality
- Long-term memory and retrieval systems
- Reasoning modules
- Tool-using agents
- Hybrid symbolic–neural systems
The next era will not be “GPT-5 but bigger.” It will be GPT-5 but smarter.
Why LLMs will become a commodity
When a technology stops delivering exponential returns, it stabilizes. It becomes predictable. Standardized. Ubiquitous.
That’s when it becomes a commodity.
And commoditization is not a downgrade — it’s the foundation for an explosion of innovation.
Costs drop dramatically
Once models stabilize, they become cheaper to run, cheaper to host, and cheaper to self-deploy. This unlocks on-prem LLMs for:
- Banks
- Hospitals
- Governments
- Enterprises that cannot send data to the cloud
Value moves up the stack
When the model becomes a commodity, the real differentiation shifts to:
- Agents
- Workflows
- Orchestration
- Fine-tuning
- Integrations
- Reasoning layers
- UX
This is where actual business value lives.
Competition increases
Vendors can no longer rely on model size as a differentiator. They compete on:
- Speed
- Price
- Privacy
- Reliability
- Developer experience
Healthy markets lead to better tools, which lead to better outcomes.
The big idea: commoditization is a win
LLMs are not the final form of AI. They are the steam engines of this era — powerful, transformative, but destined to become infrastructure.
- They will plateau.
- They will stabilize.
- They will commoditize.
And that is good.
Because commodities are building blocks. The breakthrough innovations of the 2030s will not come from scaling prediction models.
They will come from people who know how to:
- Orchestrate models
- Build agents
- Encode workflows
- Combine reasoning with tools
- Build real systems that solve real problems
The future doesn’t belong to bigger models. The future belongs to those who know how to use them well.