The Span of Control Problem Hasn’t Gone Anywhere
There’s a persistent fantasy floating around right now that AI coding agents will let a single engineer do the work of ten. Or fifty. Or an entire team. The maths sounds seductive: if one agent can produce code at 10x speed, just spin up five of them and you’ve got a 50x engineer. Ship it 🚀
Except anyone who’s actually managed people knows this logic collapses almost immediately.
The manager’s ceiling
There’s a concept in organisational theory called “span of control.” It dates back to the early twentieth century, formalised by management thinkers like Lyndall Urwick and later reinforced by decades of military and corporate experience. The short version: a single manager can effectively oversee somewhere between five and nine direct reports. Push beyond that and things start to slip. Context gets lost. Decisions get made without adequate information. The manager becomes a bottleneck or, worse, becomes irrelevant. Just rubber-stamping work they don’t actually understand.
This isn’t some dusty academic curiosity. It’s a pattern that plays out repeatedly across any sufficiently complex organisation. The best engineering managers know their teams deeply. They understand the technical context, the trade-offs being made, the risks hiding in each pull request. The worst ones have fifteen direct reports and can’t tell you what half of them are working on. They smile and approve things. Everything looks productive right up until it isn’t.
Now replace “direct reports” with “AI agents” and the problem doesn’t change. It gets worse.
Agents don’t push back
A good engineer will tell you when your plan is stupid. They’ll raise an eyebrow at the architecture diagram, ask uncomfortable questions in the RFC review, or flat-out refuse to build something they know will fall over at 2am on a bank holiday. AI agents do none of this. They comply. They generate. They produce plausible-looking code at remarkable speed and move on to the next prompt.
This is the fundamental danger of scaling AI agent output without scaling human oversight. Each agent is doing what you asked, but nobody is checking whether what you asked was the right thing. And the more agents you run in parallel, the less time you have to review any single one’s output with the rigour it deserves.
Claude Code is genuinely impressive for certain classes of work. Terraform modules, boilerplate service scaffolding, writing tests against well-understood interfaces. But it also makes decisions that look perfectly reasonable in isolation and are quietly catastrophic in context. An IAM policy that’s too permissive. A database migration that would lock a table for twenty minutes in production. Stuff that passes a code review if you’re skimming. Stuff that kills you if you’re not paying attention.
The more agents you could theoretically run, the less attention each one gets. That’s the trade-off nobody wants to talk about.
History rhymes
This pattern isn’t new. It shows up every time a force multiplier enters a system without a corresponding increase in oversight capacity.
The Industrial Revolution is the obvious parallel. Factories could suddenly produce goods at extraordinary scale, but early factory management was a disaster. Workers were treated as interchangeable units. Quality collapsed. Safety was an afterthought. It took decades of painful iteration, and a fair number of catastrophic failures, before management science caught up with production capability. Frederick Taylor’s “scientific management” was a direct response to the chaos of unmanaged scale.
The financial sector tells a similar story. The proliferation of derivatives and automated trading in the early 2000s let individual traders take on enormous positions at speed. The models were sophisticated. The throughput was incredible. And then 2008 happened, in large part because the humans nominally overseeing these systems had long since lost the ability to understand what was actually happening inside them. The span of control had been exceeded by orders of magnitude, and the feedback loops that might have caught the problem were buried under layers of abstraction.
Software engineering has its own version. Microservices. We went from monoliths to hundreds of services because each one was independently deployable and theoretically simpler. True enough. But the system-level complexity exploded, and most organisations didn’t invest in the observability, platform tooling, or architectural governance needed to manage it. You end up in a microservices mess where nobody can draw the dependency graph from memory anymore. Each service is fine. The system is chaos.
AI agents are the next iteration of this exact dynamic. More output. Same human brain trying to hold it all together.
Loosening the reins
Here’s where it gets properly dangerous. When you can’t keep up with review, you have two choices: slow down the agents, or loosen your standards. Guess which one most teams under delivery pressure will pick.
Every loosening is a small bet. You skip the detailed review on that generated Terraform plan because it “looks right” and you’ve got four other agent outputs waiting. You merge the auto-generated test suite without reading every assertion because the coverage number went up. Each individual decision is defensible. The accumulated risk is not.
This is exactly how incident debt builds. Not through single dramatic failures, but through the slow erosion of rigour at the edges. The moment you start treating generated code as trusted code by default, you’ve fundamentally changed your risk profile. And unlike a human engineer who might flag their own uncertainty, the agent delivers everything with the same confident formatting.
The human isn’t obsolete. The human’s job is changing.
So is it over for software engineers? No. But the honest answer is more nuanced than the cheerleaders or the doomsayers will admit.
Most experienced engineers write less code day-to-day than they did five years ago. Significantly less. A chunk of what used to take real time (boilerplate Kubernetes manifests, CloudFormation templates, repetitive CRUD endpoints, test fixtures) now gets generated in seconds. And nobody misses writing any of it. Nobody became an engineer because they love hand-typing the same try/except block for the four hundredth time. That work was always a tax, not a craft.
What changes is how you spend the time you get back. More of it goes to deciding what should be built, reviewing what was generated, understanding system-level implications, and making judgment calls that require context an agent simply doesn’t have. The ratio shifts from 80% writing / 20% thinking to something closer to 30% writing / 70% thinking. And thinking, it turns out, is the harder job.
The engineers who’ll thrive are the ones who understand that managing AI output is a skill in itself. It requires taste, system-level awareness, and the discipline to actually read what was generated rather than assuming it’s correct because it compiled. It requires knowing when to trust the agent and when to slow down and think. That’s not a lesser form of engineering. It might be a more demanding one.
The uncomfortable middle ground
The reality is that we’re in a transition period that’s going to be messy and uncomfortable. AI agents will absolutely increase individual throughput for well-scoped, repetitive tasks. They will not magically solve the coordination, oversight, and architectural judgment problems that make software engineering hard in the first place.
If anything, those problems get harder. Because now you’ve got more code being produced faster, with fewer humans reviewing it, deployed into systems that were already complex enough to be difficult to reason about.
The span of control problem is real. It applies to people, it applies to agents, and it applies to the intersection of the two. The organisations that figure out how to scale oversight alongside output will win. The ones that just crank up the agent count and hope for the best will learn the same lesson every over-leveraged system eventually teaches.
The hard way.