logoChibiham
cover
🤔

Skepticism on Specification-Driven Development

Introduction: A Familiar Argument

I've started seeing the term "Specification-Driven Development" on social media and tech articles. In the context of development with coding agents, claims like "finalize all specifications before development," "humans should only read and modify specs while coding agents read specs and modify code," and "humans shouldn't touch code" are being made.

The moment I saw these claims, I felt a strong sense of unease. Isn't this just Waterfall?

Looking back at the history of software development, we've transitioned from Waterfall to Agile and Extreme Programming (XP). The importance of feedback from implementation, the interaction between specification and implementation, continuous improvement—to ignore these insights and return to "write all specs first" seems like moving backwards through history.

What is Specification-Driven Development?

"Specification-Driven Development (SDD)" is a development methodology proposed by "Kiro," an integrated development environment released by Amazon Web Services (AWS) in July 2025. In September of the same year, GitHub announced "GitHub Spec Kit" based on similar principles, and in Japan, KDDI-affiliated companies began adoption.

Kiro's workflow consists of three stages: Requirements → Design → Implementation. It positions specifications as the "Source of Truth" and adopts a "documentation-first" approach. Specifications serve as a common language between humans and AI, from which code is generated.

The background to this methodology's attention is concern about "vibe coding." The approach of giving vague instructions to AI to generate code causes chaos in large codebases and raises concerns about quality and maintainability. Specification-driven development is often framed in a binary opposition: "chaotic AI usage" vs "specification-driven development".

Historical Irony: Why Did Waterfall Revive in 2025?

Looking back at software development history, the Waterfall model was mainstream for a long time. The sequential process of Requirements → Design → Implementation → Testing was an approach borrowed from construction and manufacturing.

However, this model had a fatal problem. Too many issues are only discovered during the implementation phase. Specifications may be ambiguous, technically infeasible, or misaligned with user needs—these problems only become apparent when you actually write code.

The reason Agile and XP flourished was precisely this point. There's a complex feedback loop between specification and implementation, and improving while going back and forth between them was understood to be the key to making good software.

Yet in 2025, a Waterfall-like workflow of "Requirements → Design → Implementation" has emerged again. This can only be described as historical irony.

Why Do People Repeatedly Fall for the Illusion That "Writing Specs First Will Solve Everything"?

I believe human psychology is behind this phenomenon.

Fear of complexity—Software development is inherently complex. Confronting that complexity means enduring uncertainty and chaos. The approach of "writing all specs first" provides the illusion that this uncertainty can be eliminated.

Craving for predictability—Project managers and stakeholders want predictable processes. The formula "once specs are finalized, implementation is just mechanical" appears very attractive.

Swinging from extreme to extreme—As a reaction to the chaos of "vibe coding," an escape to the order of "specification-driven development" occurs. However, the appropriate way to deal with complexity often lies in the middle ground.

Martin Fowler's Skepticism: Criticism from Authority

Interestingly, Martin Fowler's blog features a critical evaluation of specification-driven development by Birgitta Böckeler.

She actually tried Kiro and spec-kit and pointed out the following problems:

Excessive Complexity

For a small bug fix, Kiro generated 4 user stories and 16 acceptance criteria. She described this as "using a sledgehammer to crack a nut."

In other words, it's excessive for small tasks and insufficient for large ones—the applicable range is extremely narrow.

The Problem of Non-determinism

Even more fatal is the non-determinism of LLMs (Large Language Models). No matter how detailed the specifications, there's a possibility that the AI will generate different code each time it runs.

"Because of the non-deterministic nature of this technology, there will always remain a very non-negligible probability that it does things that we don't want"

This observation is essential. If reproducibility isn't guaranteed due to LLM non-determinism regardless of how perfect the specifications are, a feedback loop of examining and correcting the implementation becomes indispensable.

Past Failures: Model-Driven Development (MDD)

The article also mentions Model-Driven Development (MDD), a past attempt. MDD was also an approach of "automatically generating code from models (specifications)," but it never became widely adopted.

History repeats itself.

Evolution of Abstraction: Lessons from Next.js

Abstraction cannot be perfected in one go. It improves gradually by receiving feedback from implementation.

There's a maxim from React developer Sebastian Markbåge:

"It's easier to recover from no abstraction than the wrong abstraction"

This insight is vividly apparent in the evolution of Next.js's caching strategy.

Next.js's Three-Stage Evolution

v13-14: Implicit Cache (abstraction developers don't think about)

Initially, Next.js embraced the ideal that "developers don't need to think about caching; it will be automatically optimized." But in reality:

  • Unexpected data freshness issues
  • Difficulty debugging
  • Increased cognitive load

v14-15: Gradual Improvements (documentation, default changes)

Receiving community feedback, gradual improvements were made:

  • Removal of default fetch() caching
  • Improved documentation transparency
  • Addition of staleTimes option

However, these were merely symptomatic treatments.

v16: Explicit Abstraction with "use cache"

As a fundamental design change, the "use cache" directive was introduced:

javascript
export async function getData() {
  "use cache";
  return fetch("...").then((res) => res.json());
}

The transition from implicit (opt-out) to explicit (opt-in) meant caching only works where developers intend it.

Lesson: Perfect Specifications Cannot Be Written in Advance

The Next.js team initially embraced the specification (design philosophy) of "developers don't need to be aware." However, it was only after implementing it and having users actually use it that the problems with this specification became apparent.

Designs that work in the abstract reveal contradictions and ambiguities when you try to make them concrete—this is something many developers know from experience.

After years of receiving community feedback and discovering PPR (Partial Pre-Rendering), they finally arrived at a superior abstraction.

If the Next.js team had taken the approach of "completely finalizing specifications before implementation," this evolution would not have happened.

Digging Deeper into Non-determinism and Scale Issues

At this point, I also wondered, "Is it really impossible?" Could specification-driven development work with some ingenuity?

Full Generation vs Differential Migration

For example, instead of declaratively generating all code from specifications, what if we approached it as migration of existing code in response to specification differences? The AI performs migrations to reflect specification changes to the existing codebase.

This seems somewhat achievable.

However, problems quickly become apparent. What do you do when specification changes conflict with existing implementation structures? There's a risk of fatal structural contradictions, and ultimately human verification becomes essential.

What About Microservice Level?

What if we limit the scale? For example, at the granularity of a single service in an Event-Driven architecture—a sufficiently small, highly independent component with clear boundaries.

At this size, might specification-driven work?

But this is precisely Fowler's "scale problem." Excessive for tasks too small, insufficient for large ones. The applicable range is extremely narrow.

The psychology of wanting to think "but what if..."—this itself might be the desire for a silver bullet. The desire to escape complexity repeatedly leads us to the same illusion.

Essential Question: What Has Changed and What Hasn't?

The advent of coding agents has certainly changed something. But not everything has changed.

To answer this question, let's consider the developer's cognitive structure. Developer cognition consists of two models: Concept Model (how things should be) and Production Model (understanding of current state). The "specification" in specification-driven development refers precisely to this Concept Model.

The problem with specification-driven development is trying to perfectly construct the Concept Model first and then reflect it all at once onto the Production Model. This means no feedback is obtained, the gap between both widens, and uncertainty is maximized—the very reason Waterfall failed.

What Has Changed

The cost of converting specification → implementation has dramatically decreased.

Implementation that previously took hours now completes in minutes. In other words, the cycle of reflecting the Concept Model onto the Production Model has become blazingly fast.

Prototyping has accelerated and the trial-and-error cycle has shortened.

What Hasn't Changed

However, the following hasn't changed:

  • The difficulty of writing good specifications (Concept Model) — Building an unambiguous, implementable model remains difficult
  • The value of feedback from implementation — Many problems only become apparent when you see actual working code (Production Model)
  • The difficulty of building domain models — Appropriately modeling complex domains is a problem that precedes writing code
  • LLM non-determinism as a new constraint — Rather, new uncertainty has been added
  • The structural problem that gaps will always exist between Concept Model and Production Model

Essence: Rapid Iteration Between Concept and Production

I believe the essence of coding agents is not "finalizing the Concept Model first" but "being able to rapidly iterate between Concept Model and Production Model."

Previously, it took time to update implementation (Production Model) after changing specifications (Concept Model). That's why "finalizing specifications first" was emphasized.

But now, we can iterate between both as many times as we want at high speed. That's why an approach of co-evolving both while running the feedback loop has become possible.

Reducing the gap to zero is fundamentally impossible (new feedback constantly comes from outside, and the Concept Model moves ahead). What's important is keeping the gap small and manageable.

DDD and Context Constraints

From a Domain-Driven Design (DDD) perspective, there are also questions about specification-driven development. The Ubiquitous Language that Eric Evans emphasized is shared between domain experts, developers, and code. This means specifications and code are not separated.

When both codebase and documentation are maintenance targets, a source of cognitive load emerges: "which one is correct?" From this perspective, the Source of Truth should be limited to the codebase. Design proposals and ADRs are kept as "records of consideration," but by making it clear that they are not ongoing maintenance targets, we can prevent the increase in differential load.

The same applies to dialogue with coding agents. LLMs have context size constraints, and having them read both specifications and code wastes context. If the codebase expresses sufficient structure and intent, the AI can understand the Production Model by reading the codebase.

The Third Way: Spec-Implementation Co-evolution Approach

There is a third way that is neither "vibe coding" nor "specification-driven development." The approach of rapidly iterating between specification and implementation.

For example, when adding authentication features. In a Waterfall approach, you would design "OAuth 2.0, JWT, refresh tokens, session management..." all upfront. But with spec-implementation co-evolution, you start with a minimal Concept Model: "Users can log in with email and password." Have the coding agent implement it, look at the generated code, and realize "Token storage in localStorage? But shouldn't it be httpOnly cookies for XSS protection?" Update the Concept Model and reimplement. By rapidly cycling through this, you gradually refine while keeping gaps small.

Without feedback from the Production Model, you cannot write a good Concept Model.

Conclusion: Feedback Loops Are the Essence

What proponents of specification-driven development overlook is that complexity exists precisely in the interaction between specification and implementation. Complexity doesn't disappear by detailing specifications.

There is no silver bullet. This truth that Fred Brooks pointed out in 1986 remains unchanged in 2025. Having acquired the powerful tool of coding agents, we want to jump at the illusion that "writing specs will automatically produce code." But what history teaches us is that swinging from extreme to extreme fails.

The evolution of Next.js's caching strategy, the transition from Waterfall to Agile, and this specification-driven development debate—what all these share is the importance of feedback loops. Receive feedback from implementation to improve specifications, reflect specification intent into implementation, and receive feedback again. This circulation is the key to creating good software.


References