Migration of Software Development Bottlenecks: From Compute to Cognition to AI

Introduction: Bottlenecks Move

The fundamental insight of the Theory of Constraints is simple: every system has exactly one bottleneck, and when you resolve it, the next bottleneck emerges. Bottlenecks don't disappear—they migrate.

The history of software development can be reread as a history of this bottleneck migration. And when we look at where that migration has brought us today, we find that humans and AI face remarkably similar kinds of constraints.

When Compute Was the Bottleneck

From the 1960s through the 1990s, the central constraint of software development was physical computer resources. Memory was expensive, CPU cycles were precious, and storage was limited.

The primary concern of developers in this era was "how to make things work with fewer resources." Choosing data structures and algorithms meant optimizing for computational complexity and memory usage. The difference between bubble sort and quicksort wasn't an academic curiosity—it was a matter of whether your program would finish in a practical timeframe.

The practices born in this era were management strategies for compute as a finite resource:

Algorithmic complexity analysis (Big-O notation) — How to allocate finite CPU time
Data structure optimization — How to structure finite memory
Caching strategies — How to reduce access to slow storage
Database normalization and index design — How to optimize finite storage and I/O

What these practices share is a common thinking pattern: designing what to keep, what to discard, and how to structure things within finite resources. This pattern reappears in every subsequent era.

When Cognition Became the Bottleneck

From the 2000s onward, the landscape shifted dramatically. Exponential growth in computing power driven by Moore's Law, the rise of cloud computing, and the dramatic decline in storage costs. For most software, compute resources were no longer the primary constraint.

Yet productivity didn't scale proportionally with compute improvements. As Fred Brooks pointed out in "No Silver Bullet" (1986), the essential complexity of software development doesn't vanish with technological progress. With the compute bottleneck resolved, the next bottleneck emerged: human cognitive capacity.

Human working memory can hold only about 4 plus or minus 1 chunks. No matter how fast computers get, it's still humans who design, understand, and modify systems. When codebases ballooned to millions of lines, the bottleneck shifted from "can the computer run this?" to "can a human understand this?"

The practices born in this era were management strategies for cognition as a finite resource:

Modularization — Divide systems into cognitively manageable units. Limit the scope that must be understood at once
Domain-Driven Design — Reduce translation overhead through Ubiquitous Language. Minimize the conceptual conversion cost between domain and code
Team Topologies — Design team structure based on cognitive load. Leverage Conway's Law in reverse, splitting teams along cognitively manageable boundaries
Separation of concerns, interface design — Explicitly define "what you don't need to know"

What's notable here is that these practices share an isomorphic structure with compute-era practices:

	Compute Resource Management	Cognitive Resource Management
Partitioning	Memory space partitioning, paging	Modularization, separation of concerns
Retention & Eviction	Cache strategy (what to keep in memory)	Abstraction (what to push out of awareness)
Structuring	Data structure design	Domain model design
Locality of access	Memory access locality	Cohesion and coupling

Both are fundamentally tackling the same problem: what to keep, what to discard, and how to structure things within finite resources. The only change is that the subject shifted from CPU cycles and memory to working memory and attention.

The Isomorphic Constraint in AI

Now, AI has entered software development. AI is dramatically easing the "writing code" bottleneck. But AI, too, cannot escape finite resource constraints.

The context window of LLMs has a remarkably similar constraint structure to human working memory:

	Working Memory	Context Window
Capacity	~4±1 chunks	Thousands to millions of tokens
Behavior under pressure	Increased error rate, degraded judgment	Degraded response quality, increased hallucination
Forgetting	Information not attended to fades	Information beyond the window is lost
Chunking	Grouping information into meaningful units to save capacity	Summarization and compression to increase effective information density
Externalization	Recording in notes and documents to free the mind	Saving to file systems and databases

The quantitative scale differs, but the structure of the constraint is isomorphic. Both face the same problem: "within finite processing capacity, which information to retain, which to externalize, and how to structure it all."

What this similarity suggests is that practices developed in the cognitive resource era can be directly applied to AI context management:

Modularization → Context isolation: Give sub-agents independent context windows, returning only result summaries. Just as humans don't need to know a module's internal implementation, a parent agent doesn't need to know a sub-agent's details
Ubiquitous Language → Shared vocabulary design: Reduce conceptual drift between humans and AI, or between AI agents, to minimize translation overhead
Team Topologies → Multi-agent design: Partition agent responsibilities based on context load
Progressive disclosure → Load only needed information: Instead of stuffing everything into the context, retrieve information on demand

Finite Resource Management: The Invariant Problem Structure

Looking back at the discussion so far, we can see that the same problem structure has repeatedly emerged throughout the history of software development:

Finite resource → Development of management strategies → Resolution of bottleneck → Emergence of the next finite resource

Algorithms and data structures arose for compute constraints. Modularization and domain design arose for cognitive constraints. And isomorphic management strategies are being applied for context window constraints.

This means that software development has essentially been working on the same question throughout: "how to manage complexity within finite resources." The type of resource changes, but the structure of the question remains constant.

And this perspective has practical implications. When thinking about AI tool design and collaboration methods with AI, there's no need to invent a new paradigm from scratch. The decades of accumulated software engineering practices can be applied almost directly just by substituting the target resource. What we've learned about managing cognitive load also applies to managing context windows. This is, I think, good news for engineers.

The Question Beyond: What Happens When Context Constraints Disappear?

One caveat, however.

Just as compute constraints were (effectively) resolved, context window constraints may eventually be dramatically relaxed. When windows expand to effectively infinite size and AI can simultaneously process arbitrary amounts of information, what becomes the next bottleneck?

The experience of the compute era teaches us that resolving a constraint doesn't resolve the problem—it moves it. Even with abundant compute, software didn't get better. The next bottleneck—human cognitive capacity—simply emerged.

Similarly, even with infinite context windows, software alone won't improve. Beyond that lie questions that no amount of quantitative capacity can solve: "what should we build?", "whose needs should we serve?", and "how should we design shared concepts?"

The "Specification" Problem in the AI Era: Translation Asymmetry and the Location of SSoT — The translation problem between intent and implementation
The True Nature of "Specifications": Understanding Software Through Requirements, Contract, and Structure — The multi-layered structure of specifications