The Specification Problem in the AI Era: Translation Asymmetry and the Location of SSoT

Introduction: AI Development Reinvented "Specification"

In a previous article, The True Nature of "Specification", I organized the problem of the word "specification" being used indiscriminately to refer to different layers—Requirements, Contract, and Implementation. Building on that, I had a suspicion that Spec-Driven Development (SDD) was a revival of waterfall methodology and that its core problem was a neglect of feedback loops.

In this article, I take one more step forward to examine the structural problems in spec-driven development and consider where SSoT (Single Source of Truth) should be placed.

The impetus was a sense of unease about how "spec-driven" was being used in AI development communities. Architecture, interface design, and design philosophy—everything was being called a "specification," with the claim that writing it all down would make development work. But upon closer examination, "specification" in the AI development context turned out to be fundamentally different from "specification" in software engineering terms (the polysemy of specification).

What Is Kiro's "Specification"?

Let's look at the definition of spec-driven development in Kiro, released by AWS. Marc Brooker (VP/Distinguished Engineer at AWS) states:

A specification is simply a description of what a program should do, and what needs it should meet.

At first glance, this sounds like a discussion at the Requirements level. But in Kiro's actual workflow, user stories (Requirements), system architecture and tech stack (Implementation structure), and interface design (Contract) are all included within the spec.

In other words, a "spec" in Kiro is not a "specification" in the software engineering sense. It is "the totality of written-down intent, as an antithesis to vibe coding." The Kiro official blog highlights the greatest value of specification as: "persisting by being written down," "being version-controllable," and "becoming a North Star that AI agents can reference"—which is essentially nothing more than a structured super-prompt.

A ThoughtWorks analysis on MartinFowler.com also notes that "all three tools (Kiro, spec-kit, Tessl) claim to implement spec-driven development, but they are quite different from each other," stating that the definition of SDD itself is still fluid.

The AI development world reinvented the word "specification" with a different meaning.

Why Does Layer Distinction Dissolve?

The reason layer distinctions dissolve in the AI development context lies in the medium: prompts to LLMs. Prompts can contain:

Requirements-level "I want this kind of result"
Contract-level "the input/output should be in this format"
Implementation-level "use this algorithm"

Because all of these ride on the same medium—text—layer distinctions become inherently ambiguous. In programming languages, type systems and module boundaries forcibly separate layers, but natural language has no such mechanism.

The Inherent Incompleteness of Natural Language SSoT

The core of Kiro-style SDD is the approach of making "specs written in natural language" the SSoT (Single Source of Truth). However, this is inherently incomplete.

Natural language cannot eliminate ambiguity. There will always be an interpretation gap between text written in requirements.md and the actual code, and the gap is ultimately filled by the AI's "best judgment." This is a structured version of vibe coding, not a rigorous SSoT.

On the other hand, Implementation contains intent. Code written in a programming language is not merely an implementation; it formally expresses intent through type constraints and invariants.

typescript

// The type communicates Structure constraints = intent is formalized
type Order =
  | { status: "draft" }
  | { status: "paid"; paidAt: Date }
  | { status: "shipped"; paidAt: Date; shippedAt: Date };

The intent that "shipped orders must always have a payment timestamp" is embedded in the code itself. Writing this in a natural language spec is just documentation with no enforcement power. Code enforces constraints and detects violations at compile time. Natural language cannot do that.

The Asymmetry Between Intent and Implementation

Here lies the basis for skepticism about spec-driven development. There is a decisive directional asymmetry between intent and implementation. The essence of this asymmetry has traditionally been described as "a difference in information content," but more precisely it is a difference in determinism and falsifiability.

Intent → Implementation: Non-Deterministic Transformation

Transforming natural language intent into Implementation is inherently non-deterministic. From the same intent, infinitely many different implementations can be generated. LLM non-determinism compounds this, producing different Structures each time, each implying different Behaviors.

Even with a spec as SSoT, the projection from it is unstable. Natural language intent lacks the information needed to uniquely determine a concrete implementation; the shortfall must be filled from somewhere. That "somewhere" becomes the LLM's probabilistic reasoning.

Furthermore, natural language has no determinism of behavior. From "users can search for products," no concrete behavior is determined. Different behaviors can be imagined with each interpretation.

Implementation → Intent: Deterministic But Not Unique

So what about the reverse direction—reading intent from Implementation?

There is something to honestly acknowledge here. The transformation from Implementation to intent is also not uniquely determined. The same sort function might be used "for UI readability" or "as pre-processing for binary search." From the code, we can tell "it's sorting," but why it's sorting cannot be uniquely read.

Therefore, the simple picture of "Implementation→intent is many-to-one and convergent, intent→Implementation is many-to-many and non-deterministic" is not accurate. Both directions are many-to-many.

The Essence of Asymmetry: Determinism and Falsifiability

So where does Implementation's advantage lie? In behavioral determinism and falsifiability.

Behavioral determinism. Run the code and the behavior is uniquely determined. Natural language interpretation is never settled. Even if reading intent is not unique, "what this specific code does concretely" is beyond dispute.

Natural language: both behavior and intent are indeterminate
Code:            behavior is deterministic, intent is indeterminate (but constrained)

Falsifiability. With code, you can answer Yes/No to the question "does this behavior match the intent?" With only natural language specs, interpretation diverges at the stage of "what does this spec even say?" It is not that intent extraction is easy, but that detecting deviations from intent is easy.

Concreteness as a feedback origin. With an implementation, you can say "try using this and tell me what feels wrong." With a natural language spec, you can only say "read this and tell me what feels wrong." The quality of the former feedback is overwhelmingly higher.

In other words, Implementation's advantage is not "the ability to accurately convey intent," but rather "serving as a feedback origin from which deviations from intent can be discovered."

Inferability: A Speed Parameter for the Feedback Loop

Here, the quality of Implementation becomes an issue.

Behavioral determinism and falsifiability do not depend on implementation quality. Even with the messiest code, running it determines behavior deterministically, and deviations from intent can be detected. The direction of asymmetry is maintained regardless of Implementation quality.

However, feedback speed depends greatly on implementation quality.

Contract Inferability	Feedback Speed
High (discriminated unions, declarative structures, clear naming)	Structure is clear from types/interfaces → fast judgment on "does this match intent?"
Low (procedural descriptions, implicit state management)	Must run and trace to understand → slower judgment. But judgment is still possible

Contract inferability is a parameter that determines feedback loop speed. Higher-inferability Implementations run feedback loops faster, enabling earlier detection and correction of deviations between intent and implementation.

Implementation's advantage (why it suits being SSoT)
  ← behavioral determinism and falsifiability (independent of implementation quality)

Feedback speed (efficiency as SSoT)
  ← Contract inferability (dependent on implementation quality)

The conclusion this leads to is: feedback loops are the essence. The reason Implementation should be the SSoT is not that intent extraction is easy, but that it is the best feedback loop origin.

Where AI Is Most Effective

If we accept this framework, the most effective use of AI is not "generating code from intent," but rather:

Translating feedback — observing code behavior, detecting and verbalizing deviations from intent
Analyzing code Structure to detect deviations from Contract and Requirements
Filling in Behavior within the constraints of Structure designed by humans

The last use case lies on the extension of an approach where specification and implementation co-evolve rather than oppose each other—what might be called "spec-implementation co-evolution." Humans design the structure, and AI implements behavior within the constraints of that structure. The principle: "structure first, behavior second."

Where Should SSoT Be Placed?

From the above discussion, a conclusion about the location of SSoT emerges.

SSoT should be placed in Implementation, not in natural language specs.

To summarize the reasons:

Behavioral determinism — Code run produces uniquely determined behavior. Natural language lacks this determinism
Falsifiability — "Does this behavior match the intent?" can be answered Yes/No. With natural language specs, the question itself becomes ambiguous
Feedback origin — A concrete artifact is what enables the feedback loop of "try it and point out what's different"
Enforcement — Compilers and runtimes detect constraint violations. Natural language has no such power
Avoiding dual maintenance — Maintaining both spec and code creates cognitive load of "which one is correct?"

This is also consistent with Domain-Driven Design philosophy. Ubiquitous language is shared among domain experts, developers, and code, and it presupposes that specification and code are not separated.

The Vision of Specification Languages and Their Difficulties

That said, current programming languages are not perfect at fulfilling the SSoT role. While they can serve as feedback origins, low-inferability implementations take time to verify intent.

Ideally, a language where specification and implementation become the same artifact would be desirable. Code that self-explains "why it is this way" at the Contract level for readers, and is structured enough that AI translations to and from natural language don't lose information. A language where Contract inferability is extremely high—meaning the feedback loop runs at maximum speed.

Historical Attempts and Partial Successes

Historically, formal specification languages like TLA+, Alloy, and Z notation addressed this problem, but never gained widespread adoption due to high learning costs and the gap with practical usability. However, some attempts at integrating specification and implementation have partially succeeded:

Dependently typed languages (Idris, Agda) — can express invariants at the type level; specifications like "a list of length N" are embedded in types
Refinement Types (Liquid Haskell, F*) — attach predicates to existing types, putting constraints in types like {x: Int | x > 0}
Design by Contract (Eiffel, Ada/SPARK) — pre-conditions, post-conditions, and invariants are supported at the language level

Today, TypeScript's discriminated unions and Rust's algebraic data types partially fill this role in a lighter-weight way.

Necessary Conditions and Inherent Difficulties

For a specification language to be practically viable, at least the following conditions are needed:

Executable — the description runs as-is; if description and implementation are separate artifacts, the dual maintenance problem remains
Sufficiently rich type system — domain invariants expressible at the type level, with violations detectable at compile time
Learning cost comparable to existing languages — historically the biggest bottleneck; dependently typed languages are powerful but the burden of writing proofs is too great
Integration with existing ecosystems — isolated languages don't get adopted
High-quality bidirectional AI translation — conversion between natural language ↔ this language without information loss

The tradeoff between conditions 2 and 3 is the inherent difficulty. Increasing expressiveness raises learning costs. Reducing learning costs makes the language barely distinguishable from existing programming languages.

However, condition 5 could change this tradeoff. Traditionally, learning cost was a barrier on the assumption that humans write directly; but if AI handles the conversion, the language itself only needs to be readable by humans—writeability becomes less important. Languages that balance readability and expressiveness have a wider design space than those also required to be easily writable.

That said, no such intermediate language exists at this point. In that case, placing SSoT in Implementation and writing high-inferability code to accelerate feedback loops is the current best approach.

When Cognitive Constraints Ease

The discussion so far assumes current technical constraints. But the fundamental constraint is cognitive limitations—both human cognitive capacity and AI context windows. What changes when these constraints ease significantly?

Reduction of Non-Determinism

The reason intent→Implementation is non-deterministic is that intent alone lacks the information to uniquely determine an implementation. But in actual development, there are vast constraints beyond intent:

Existing codebase structure and conventions
Architecture patterns and technology stack choices
Dependency library APIs
Team's history of design decisions
Behavioral constraints defined by the test suite

If all of these constraints fit in context, the solution space for intent→Implementation narrows dramatically. Expanding context length could improve the precision of intent→implementation conversion, potentially mitigating the non-determinism weakness of SDD. Note, however, that the majority of referenced context is Implementation-side artifacts (existing code, tests, history of design decisions). The structure of SSoT residing in Implementation does not change; what improves is not the precision of natural language specs, but the practical conversion accuracy as AI can reference Implementation more broadly.

Two Problems That Remain

However, two problems remain even as context expands infinitely:

The greenfield problem — In new development without an existing codebase, constraints are few and non-determinism remains large. At the first step, "intent→Implementation" remains inherently non-deterministic
The inherent ambiguity of natural language — Requirements like "easy to use for the user" cannot be uniquely interpreted no matter how much context is added. As long as natural language is the medium for specification, this problem persists

Dissolving the Opposition: Beyond Spec-Driven vs. Implementation-Driven

An interesting possibility opens up in a world where cognitive constraints ease sufficiently.

A world where Implementation remains the SSoT while AI extracts and explains layer-level information from it in real time on demand. AI instantly decomposes and presents Requirements, Contract, and Implementation distinctions. "Specification documents" as independent artifacts become unnecessary; the codebase itself becomes a state of "read it and understand everything."

Ironically, expanding context length simultaneously eases the conditions for specification languages to exist while also reducing the need for them. If AI can sufficiently grasp structure and behavior from existing programming languages, the motivation to design a new intermediate language diminishes.

Furthermore, the "spec-driven vs. implementation-driven" opposition itself may be a distinction arising from human cognitive limitations. The Requirements, Contract, Implementation framework is a cognitive tool for humans to divide and understand complex software; for an intelligence that can grasp all at once, this distinction becomes unnecessary.

In that future, the value of the framework organized in this article will remain not as "description of truth" but as "a tool to assist human cognition."

Conclusion

The "spec-driven development" of the AI development context reinvented the concept of "specification" in software engineering with a different meaning. Its essence is a "structured super-prompt," and it has no interest in distinguishing the layer hierarchy of Requirements, Contract, and Implementation.

However, the cost of not distinguishing layers is high. The transformation from intent to Implementation is inherently non-deterministic, and making natural language specs the SSoT does not stabilize the projection from it.

Implementation's advantage is not "the ability to accurately read intent." It lies in behavioral determinism and falsifiability—that running the code uniquely determines what happens, and deviations from intent can be judged Yes/No. This makes Implementation function as a feedback origin.

And feedback speed depends on Contract inferability—the ability to infer structure from types and interfaces. Higher-inferability Implementations run feedback loops faster.

Place SSoT in Implementation and write high-inferability code to maximize feedback loops. Use AI not in the direction of "generating code from intent," but in the direction of "observing code behavior, detecting deviations from intent, and implementing behavior within the constraints of structure." That will be the foundation for a third path—where specification and implementation co-evolve—distinct from both vibe coding and spec-driven development.