Back to Blog
AI-First Development

AI Tools Reinvent the Wheel Every Day

A thought experiment: Imagine a craftsman who rebuilds his tools from scratch every morning. Absurd? That is exactly how the world's most intelligent AI coding tools operate. On the Groundhog Day problem of the AI industry — and the uncomfortable question of why nobody is fixing it.

G
Guido Mitschke
6 Min. read
AI Tools Reinvent the Wheel Every Day

A thought experiment: Imagine a craftsman. Every morning, he walks into his workshop. He looks around as if he's never been there before. He checks whether there's wood. He checks whether there are nails. Then he builds himself a saw. He cuts a board. Next morning? Same ritual. Yesterday's saw? Gone.

Absurd? That's exactly how the world's most intelligent AI coding tools operate.

The Groundhog Day Problem

I've been watching this for months across every major AI coding tool — Claude Code, OpenAI Codex, Cursor, Windsurf, Aider. The pattern is always the same.

Session starts. The AI agent orients itself. ls — what's here? cat — what's in this file? grep — where's the relevant code? It reads its environment like an amnesia patient waking up every morning with no idea where he is.

Then it writes a script. A small Python program. Disposable. It runs it, delivers the result, and the script vanishes into the digital trash.

Next request, next session: Same ritual from scratch. Same bash commands, same orientation phase, same throwaway script — just with slightly different variables this time.

What's missing here is so obvious it almost hurts: Reuse.

Every junior developer learns this in their first week: If you need something twice, make it a function. If you need it ten times, make it a module. If you need it a hundred times, make it a library.

The supposedly most intelligent code generators on the planet haven't internalized this principle.

What Should Actually Happen

Let's imagine an AI agent working like a seasoned developer — say, one with 30 years of experience. What would he do differently?

He'd maintain a toolbox. The first time he needs to parse a CSV file, he writes a solid parse tool. Parameterized. Tested. With error handling. The second time, he takes it off the shelf, adjusts the parameters, done.

He'd build application structures. Not loose scripts in /tmp/, but an organized project. /tools/, /utils/, /templates/ — a growing, versioned toolchain that gets richer with every solved problem.

He'd learn from mistakes. Did the parse tool fail on a specific CSV variant? Fix it, bump the version, move on. Not write the same script from zero tomorrow and hit the same bug again.

He'd codify his domain knowledge. The AI has enormous knowledge about programming languages, frameworks, best practices. But instead of distilling that knowledge into stable, reusable tools, it sprinkles it into throwaway code every single time.

This sounds so self-evident that you have to wonder: Why isn't it happening?

Three Reasons — and One of Them Is Uncomfortable

1. The Context Window Problem (the technical reason)

Large Language Models have a limited working memory. Everything the agent knows has to fit inside its context window — a kind of short-term memory of typically 128,000 to 200,000 tokens. If it's not in there, it doesn't exist.

This means the agent simply can't remember yesterday's script. Nobody hands it over. There's no persistent tool shelf it could reach for.

This is real. This is a technical limitation. But it's not an unsolvable one. You could give the agent a toolchain, a versioned repository of its own tools, an index it can query. The technology exists — it's just not being deployed consistently.

2. The Trust Problem (the cautious reason)

When an agent reaches for self-built tools, a legitimate question arises: Is the tool still correct? Has the environment changed? Does the CSV parser from last week still work with the new data format?

Providers would rather accept the overhead of rewriting than risk silent misbehavior from stale tools. Freshly written code is guaranteed to be tailored to the current context — even if it's 90% identical to yesterday's code.

That's defensive, but not unreasonable. There's a saying in software engineering: "Known unknown > unknown unknown." Better to consciously rewrite than unconsciously rely on something outdated.

However: Humans have been solving this problem for decades with versioning, tests, and dependency management. It's not an unsolved problem. It's just not being addressed.

3. The Token Problem (the uncomfortable reason)

And here's where it gets awkward.

Every bash command burns tokens. Every ls, every cat, every freshly written script — those are input and output tokens. Tokens are the currency of the AI industry. More tokens mean more revenue.

An agent that works efficiently — that grabs its tool from the shelf, adjusts one parameter, and finishes in three tokens — would be an economic disaster for the provider. The current agent, who first orients himself (500 tokens), then writes a script (2,000 tokens), runs it (500 tokens), and interprets the result (500 tokens), is a token multiplier.

I'm not claiming providers deliberately designed this system to burn tokens. But I am claiming there's no strong economic incentive to change it.

What Exists — and Why It's Not Enough

There are approaches. Anthropic's Model Context Protocol (MCP) is exactly the right idea at its core: standardized, reusable tool interfaces that the agent can use without rebuilding them every time. A "tool shelf protocol," if you will.

So-called skills systems point in the same direction — pre-built best-practice guides for recurring tasks. Memory systems attempt to give the agent long-term recall.

But none of the major providers have taken the consequential step: Allowing the agent to maintain its own growing toolchain. A collection of self-built, tested, versioned tools that it retrieves and adapts as needed.

Instead, we get: Stateless agents with amnesia, reinventing the wheel every session while burning an impressive number of tokens in the process.

The Elephant in the Room

The AI industry is selling us "intelligent" coding assistants that fail at the most fundamental engineering virtue: efficiency through reuse.

It's as if Toyota built its cars on an assembly line, but reinvented the tools needed to build each car from scratch. No automaker would work like that. No experienced software developer would work like that. But the supposedly most advanced code generators in the world do exactly that.

The solution isn't complicated. It doesn't require breakthroughs in AI research. It only requires the consistent application of principles that every IT professional has known since the 1970s:

  • Modularization instead of throwaway scripts
  • Persistent toolchains instead of stateless amnesia
  • Versioning instead of rewriting
  • Parameterization instead of hardcoding
  • Growing toolboxes instead of daily Groundhog Day

The question isn't whether it can be done better. The question is why it isn't being done better.

And there are only two possible answers: Either the providers haven't recognized the problem. Or they have no incentive to solve it.

Both would be concerning.

G

About the Author

Guido Mitschke

Digital Nomad und Unternehmer. Gründer von Today is Life. Lebt mehrere Monate im Jahr auf Kreta und schreibt über das Leben, Reisen und Unternehmertum in Griechenland.

Frequently Asked Questions

Drei Gründe: das begrenzte Context Window (technisch), das Vertrauensproblem bei selbstgebauten Tools (vorsichtig), und fehlende wirtschaftliche Anreize zur Effizienz, da mehr Tokens mehr Umsatz bedeuten (unbequem, aber wahr).
Das Groundhog-Day-Problem beschreibt, wie KI-Coding-Agenten wie Claude Code, Cursor und Copilot jede Session von Null starten — und Skripte, Hilfsfunktionen und Tools neu bauen, die sie gestern bereits gebaut haben. Ohne persistentes Tool-Gedächtnis ist jede Session Tag eins.
Nicht absichtlich — aber die Anreizstruktur schafft ein Alignment-Problem. KI-Anbieter verdienen pro verbrauchtem Token. Ein Tool, das den gestrigen Code wiederverwendet, ist ein weniger profitables Tool. Das bedeutet nicht, dass Verschwendung gewollt ist, aber Effizienz wird nicht belohnt.
KI-Coding-Tools verbrauchen Tokens nicht nur für das eigentliche Coding, sondern auch für das Neubauen von Gerüsten, Diagnose-Skripten und Hilfsfunktionen in jeder Session. Da Tokens Geld kosten und KI-Unternehmen vom Verbrauch profitieren, entsteht ein strukturelles Misalignment zwischen Nutzer-Effizienz und Anbieter-Umsatz.
Die Lösung ist ein persistentes, versioniertes Tool-Repository, das der Agent abfragen kann bevor er etwas neu baut. Einmal gebaute und getestete Tools werden gespeichert, indexiert und wiederverwendet — nach denselben Prinzipien, die Softwareentwickler seit den 1970ern kennen: Modularisierung, Versionierung, Parametrisierung. MCP (Model Context Protocol) ist ein erster Schritt, erfordert aber konsequente Umsetzung.

Comments

Please log in to comment.

Login

No comments yet. Be the first!

Related Articles