Core Loop

AI-first engineering at scale

Тема

Rozrobka na osnovi dokaziv

Vidsutnaya dyscypina u inzheneriyi na osnovi shtuchnoho intelektu

Daniel Leblond March 2026

Teams adopted AI coding tools, saw short-term velocity spikes, and then paid the verification tax later in debugging, regressions, and production incidents.

The gap is not generation. The gap is proof. Evidence-Driven Development turns that gap into a repeatable workflow with explicit gates.

Perceived Speed vs. Actual Speed (METR): +24% belief versus -19% measured reality.

The Loop

The model is simple: define intent, prove the gap, capture baseline, implement, prove pass, capture result, and verify quality dimensions before review.

The critical constraint is sequence. Steps before implementation create reliability; steps after implementation create trust.

| Phase | What happens | Why it matters | | --- | --- | --- | | Document | Write what done means before implementation. | Prevents drifting requirements and vague success criteria. | | Test: Fail | Define and run tests that prove the gap exists. | Confirms you are testing behavior, not assumptions. | | Capture: Before | Record baseline outputs before touching implementation. | Provides non-negotiable proof for reviewers and future audits. | | Implement | Apply the change with AI assistance under constraints. | Execution stays fast while the bar remains human-defined. | | Test: Pass | Run targeted tests and confirm behavior now passes. | Validates the change solves the exact acceptance criteria. | | Capture: After | Collect equivalent post-change artifacts. | Enables clear before/after comparison. | | Verify | Audit security, accessibility, performance, docs, and drift. | Catches failure modes tests alone miss. | | Review | Human reviewer accepts or rejects based on evidence. | Keeps accountability with engineers, not prompts. |

:::graphic name: ImplementationLoopDiagram caption: The implementation loop: human-defined constraints, AI-assisted execution. :::

Etap Scho vidbuvayetʹsya Chomu tse vazhlyvo
Document Write what done means before implementation. Prevents drifting requirements and vague success criteria.
Test: Fail Define and run tests that prove the gap exists. Confirms you are testing behavior, not assumptions.
Capture: Before Record baseline outputs before touching implementation. Provides non-negotiable proof for reviewers and future audits.
Implement Apply the change with AI assistance under constraints. Execution stays fast while the bar remains human-defined.
Test: Pass Run targeted tests and confirm behavior now passes. Validates the change solves the exact acceptance criteria.
Capture: After Collect equivalent post-change artifacts. Enables clear before/after comparison.
Verify Audit security, accessibility, performance, docs, and drift. Catches failure modes tests alone miss.
Review Human reviewer accepts or rejects based on evidence. Keeps accountability with engineers, not prompts.
The implementation loop: human-defined constraints, AI-assisted execution.

Before Evidence Is Irreversible in Practice

Teams can theoretically reconstruct a baseline after implementation starts, but almost nobody does. Momentum shifts to fixing forward.

That is why missing before-evidence is treated as a reset condition in disciplined loops.

:::graphic name: MaturityLadder caption: Maturity model: ad-hoc to audit-verified engineering. :::

Maturity model: ad-hoc to audit-verified engineering.

Audit: desyat vymiriv

Vymir Scho vyjavlyaye
Build Compilation, lint, and suite integrity
Telemetry PII leaks and unsafe logging payloads
Accessibility Landmarks, keyboard flow, heading hierarchy
Security Secrets, injection risk, dependency flaws
Performance N+1 paths, unbounded loops, memory leaks
Documentation Spec and implementation drift
Test Coverage Behavior changes without matching tests
TODO Debt Skipped follow-ups and unresolved placeholders
Error Handling Swallowed errors and leaked internals
AI Verbosity Redundant comments and unnecessary abstractions
Audit posture before and after evidence-driven checks.

The Audit: Ten Dimensions

| Dimension | What it catches | | --- | --- | | Build | Compilation, lint, and suite integrity | | Telemetry | PII leaks and unsafe logging payloads | | Accessibility | Landmarks, keyboard flow, heading hierarchy | | Security | Secrets, injection risk, dependency flaws | | Performance | N+1 paths, unbounded loops, memory leaks | | Documentation | Spec and implementation drift | | Test Coverage | Behavior changes without matching tests | | TODO Debt | Skipped follow-ups and unresolved placeholders | | Error Handling | Swallowed errors and leaked internals | | AI Verbosity | Redundant comments and unnecessary abstractions |

:::graphic name: AuditRadarChart caption: Audit posture before and after evidence-driven checks. :::

PR template enforcing observable evidence, audit output, and explicit test plans.

Pryklady dokaziv za domenamy

Domeny Do zmin Pisya zmin
API endpoint curl response with wrong status curl response with expected status and schema
Database migration Query before migration Query showing new columns and populated values
Infrastructure Current plan output Desired plan and apply output
Performance Benchmark baseline Benchmark delta after optimization
Security patch Scanner finding Scanner clean report
Same loop, different artifacts, one quality standard.

The Burden of Proof in Pull Requests

Повернутися на головну

Посилання

  1. Kent Beck (2025) Augmented Coding: Beyond the Vibes
  2. ThoughtWorks (2025) AI-Aided Test-First Development
  3. METR (2025) AI Tools Made Experienced Developers 19% Slower
  4. Addy Osmani (2026) AI Writes Code Faster. Your Job Is Still to Prove It Works.
  5. Microsoft .NET (2026) Ten Months with Copilot Coding Agent in dotnet/runtime