AI Code Governance

AI Isn't Creating Technical Debt. It's Creating Intent Debt.

Clarvia Team
Author
Apr 18, 2026
10 min read
AI Isn't Creating Technical Debt. It's Creating Intent Debt.

The outage lasted four hours. Not because the fix was hard -- because nobody could explain what the system was supposed to do.

Three engineers, three different theories about the intended behavior. The code was clean. Tests passed. The architecture was reasonable. But when the edge case hit, nobody could explain why the service worked the way it did, what assumptions it encoded, or what would break if they changed it.

They didn't have a code problem. They had an intent problem.


The Debt Nobody's Measuring

We've spent the last 18 months talking about AI-generated code quality. The 1.7x issue multiplier. The security vulnerabilities. The maintenance costs.

Those are real problems. But they're the visible ones.

The invisible problem is worse: teams are losing the ability to explain why their systems behave the way they do.

Margaret Storey at the University of Victoria coined the term "cognitive debt" in February 2026 -- the erosion of shared understanding across a team as AI generates code faster than humans can absorb it. Her arxiv paper with colleagues formalized a Triple Debt Model: technical debt in code, cognitive debt in people, and intent debt in externalized knowledge.

Addy Osmani called the related phenomenon "comprehension debt" -- the growing gap between how much code exists in your system and how much of it any human genuinely understands.

But the most dangerous layer isn't what developers understand or don't understand about the code. It's what nobody recorded about why the code exists.

That's intent debt.


Defining Intent Debt

Technical debt is code that's costly to change.

Intent debt is code that's costly to explain and justify.

Intent debt is the gap between what the system does and the team's ability to explain three things: why it does it, what constraints it must satisfy, and what evidence supports the current approach.

This is not documentation debt. Documentation describes what. ADRs capture big architectural decisions made in committee. Intent debt lives in the everyday behavioral changes -- the small PRs that quietly reshape system invariants without recording why. ADRs don't cover a 40-line refactor that changed retry logic. PR descriptions don't capture the threat model assumptions behind an auth flow. Wiki pages don't track which constraints are sacred versus which were expedient.

Intent debt can exist in clean code. It can exist in well-tested code. It can exist in code that passes every linter, scanner, and review tool you have.

It shows up when requirements shift, when an auditor asks "why," when a new team inherits the system, or when an incident hits and nobody can reason about invariants under pressure.

AI amplifies intent debt because code generation optimizes for plausibility and local correctness -- not organizational memory. Pull requests become diff-shaped, not decision-shaped. The code says what it does. Nobody recorded why it does it that way.


The Numbers Behind the Gap

Three data points tell the story.

CodeRabbit's analysis of 8.1 million pull requests -- across thousands of repositories, measuring actual merge patterns -- found teams using AI merged 98% more PRs that were 154% larger year-over-year. The output of code exploded. The capacity to understand that code did not.

An Anthropic study of 52 engineers learning an unfamiliar codebase found those using AI assistance scored 17% lower on comprehension tests. They produced code faster but understood it less. The largest declines were in debugging and code reading -- the skills most critical during incidents.

And 61% of developers in CodeRabbit's survey reported that AI-generated code "looks correct but is unreliable." Only 48% consistently review it before committing.

The bottleneck was never writing code. It was knowing why the code exists. And that bottleneck is getting worse, not better.


What Intent Debt Actually Costs

Intent debt doesn't announce itself through failing builds or breaking tests. It announces itself through slow, expensive consequences:

Incident response. When the system behaves unexpectedly, the first question is "what is it supposed to do?" If nobody can answer that quickly, mean time to resolution multiplies. You aren't debugging code. You're reverse-engineering intent.

Fragile refactors. Engineers won't touch code they can't justify. Unknown coupling, unknown invariants, unknown assumptions -- these create no-go zones in the codebase. The system calcifies around decisions nobody remembers making.

Security regressions. Auth flows that accreted without a threat model. Permission scopes that nobody can justify. "Temporary" exceptions that became permanent because nobody recorded why they were temporary.

Audit and compliance. SOC 2 now includes AI governance criteria. Auditors are beginning to ask not just "what controls exist" but "can you demonstrate the rationale for this design?" Intent debt turns every audit into an archeological excavation.

75% of technology leaders are projected to face moderate or severe technical debt problems by 2026, per Forrester. But the teams I've talked to aren't struggling with code quality in isolation. They're struggling because they can't explain their own systems.


A 10-Minute Diagnostic

Run this on your last three merged PRs. For each one, answer five questions:

1. Intent sentence. Can the author write one sentence: "We changed X to achieve Y under constraints Z"?

2. Invariant list. Are there 2-5 things that must remain true? Idempotency. Ordering guarantees. Auth boundaries. Latency budgets.

3. Evidence link. Is there a test, metric, or experiment proving the approach works? A benchmark. An A/B result. A unit test covering the critical path.

4. Blast radius. What breaks if you revert? What breaks if you keep it? Dependencies, data shape, contracts.

5. Owner. Who is accountable for the intent staying true 90 days from now?

Score each PR:

0-1 answered = high intent debt. 2-3 = moderate. 4-5 = low.

If you can't answer these for recent PRs, you don't have missing documentation. You have unowned intent.


The Intent Ledger

Intent debt requires a specific remedy: a lightweight, durable record of why the code exists. Not a wiki. Not a novel. A ledger.

Each entry covers one meaningful behavior change and contains:

Decision: "We are doing X because Y." Constraints: Performance, security, compliance, latency, cost, data retention. Invariants: What must never change. Tradeoffs: What you're explicitly not optimizing. Evidence: Tests, metrics, incident links, benchmarks, tickets. Expiry: When to re-validate the assumptions. Owner: Team or role responsible for the intent staying true.

Where it lives: an /intent/ directory in the repo, ADR-style but shorter. Or PR template fields that auto-append to the ledger on merge. For regulated environments, tie it to the change management ticket.

The rules that keep it lightweight: one entry per meaningful behavior change. Ten lines maximum unless it's a high-risk area. No merge without an intent sentence and invariants.

The critical distinction: let AI generate code. Require humans to own intent.

AI can draft ledger entries. But the reviewer must sign off. The rationale must be human-accountable. The system knows what it does. The ledger records why.


Making It Work Without Slowing Down

The objection writes itself: "This will slow teams down."

It won't -- if you tier it by risk.

Low-risk changes (docs, examples, UI copy): intent sentence only. One line. Ten seconds.

Medium-risk changes (business logic, API contracts): intent sentence plus invariants. Three minutes.

High-risk changes (auth, payments, data migrations, crypto): full ledger entry. Five minutes at merge time, hours saved at incident time.

Three practices that compound:

Intent-first PRs. Start the PR description with intent and invariants. Code second. This changes how reviewers read the diff -- they evaluate against the stated purpose, not just syntax.

The "explain it back" rule. The reviewer asks: "What would break if we removed this?" If the author can't answer, the intent isn't understood well enough to ship.

Intent refactoring sprints. Like tech debt sprints, but for rationale. Pick a critical service. Walk through it. Record the intent for each major decision. Rebuild shared understanding deliberately.


What This Means for Leaders

If you manage an engineering team, three things change:

Measure it. Track the percentage of PRs with an intent sentence, invariants, and evidence. This is a leading indicator. You won't see the debt until an incident forces you to, unless you measure the inputs.

Redefine "done." Done is not shipped. Done is shipped, explainable, and owned. If the team can't explain why a change exists, it's not done -- it's deferred understanding.

Use it as audit evidence. The Intent Ledger becomes the rationale trail that compliance and security teams need. It accelerates incident response because the invariants are already documented. It reduces onboarding time because new engineers can read the "why" alongside the "what."

Mean time to explain -- MTTX -- may become as important as mean time to resolve. If your on-call engineer can't explain the system's intended behavior within minutes, your incident response is already compromised.


The Bet

Simon Willison wrote in February 2026: "Code is now inexpensive."

He's right. The bottleneck moved permanently from writing to understanding.

The teams that navigate the next two years of AI-assisted development will not be the ones that generate the most code. They will be the ones that maintain the clearest record of why their systems behave the way they do.

Intent debt compounds silently. It doesn't break builds. It doesn't fail tests. It just makes every future decision slower, every incident longer, every refactor riskier, until the system is too fragile to change and too opaque to trust.

The question isn't whether your team has intent debt.

It's whether you start measuring it before the next outage forces you to.


Sources: Margaret Storey, "How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt" (UVic, Feb 2026); Storey et al., "From Technical Debt to Cognitive and Intent Debt" (arxiv 2603.22106, Mar 2026); Addy Osmani, "Comprehension Debt" (addyosmani.com, Mar 2026); CodeRabbit State of AI vs Human Code Generation report (2026); Anthropic engineering comprehension study (2026); Stack Overflow, "AI Can 10x Developers... In Creating Tech Debt" (Jan 23, 2026); Simon Willison, Agentic Engineering Patterns (Feb 2026); Forrester (2026); SOC 2 AI governance criteria (2026).

intent debtcognitive debtcomprehension debtAI code quality

Ready to Transform Your Development?

Let's discuss how AI-first development can accelerate your next project.

Book a Consultation

Cookie Preferences

We use cookies to enhance your experience. By continuing, you agree to our use of cookies.