Vývoj řízený důkazy
Chybějící disciplína v AI-asistovaném inženýrství
Teams adopted AI coding tools, saw short-term velocity spikes, and then paid the verification tax later in debugging, regressions, and production incidents.
The gap is not generation. The gap is proof. Evidence-Driven Development turns that gap into a repeatable workflow with explicit gates.
The Loop
The model is simple: define intent, prove the gap, capture baseline, implement, prove pass, capture result, and verify quality dimensions before review.
The critical constraint is sequence. Steps before implementation create reliability; steps after implementation create trust.
| Phase | What happens | Why it matters | | --- | --- | --- | | Document | Write what done means before implementation. | Prevents drifting requirements and vague success criteria. | | Test: Fail | Define and run tests that prove the gap exists. | Confirms you are testing behavior, not assumptions. | | Capture: Before | Record baseline outputs before touching implementation. | Provides non-negotiable proof for reviewers and future audits. | | Implement | Apply the change with AI assistance under constraints. | Execution stays fast while the bar remains human-defined. | | Test: Pass | Run targeted tests and confirm behavior now passes. | Validates the change solves the exact acceptance criteria. | | Capture: After | Collect equivalent post-change artifacts. | Enables clear before/after comparison. | | Verify | Audit security, accessibility, performance, docs, and drift. | Catches failure modes tests alone miss. | | Review | Human reviewer accepts or rejects based on evidence. | Keeps accountability with engineers, not prompts. |
:::graphic name: ImplementationLoopDiagram caption: The implementation loop: human-defined constraints, AI-assisted execution. :::
| Fáze | Co se stane | Proč na tom záleží |
|---|---|---|
| Document | Write what done means before implementation. | Prevents drifting requirements and vague success criteria. |
| Test: Fail | Define and run tests that prove the gap exists. | Confirms you are testing behavior, not assumptions. |
| Capture: Before | Record baseline outputs before touching implementation. | Provides non-negotiable proof for reviewers and future audits. |
| Implement | Apply the change with AI assistance under constraints. | Execution stays fast while the bar remains human-defined. |
| Test: Pass | Run targeted tests and confirm behavior now passes. | Validates the change solves the exact acceptance criteria. |
| Capture: After | Collect equivalent post-change artifacts. | Enables clear before/after comparison. |
| Verify | Audit security, accessibility, performance, docs, and drift. | Catches failure modes tests alone miss. |
| Review | Human reviewer accepts or rejects based on evidence. | Keeps accountability with engineers, not prompts. |
Before Evidence Is Irreversible in Practice
Teams can theoretically reconstruct a baseline after implementation starts, but almost nobody does. Momentum shifts to fixing forward.
That is why missing before-evidence is treated as a reset condition in disciplined loops.
:::graphic name: MaturityLadder caption: Maturity model: ad-hoc to audit-verified engineering. :::
Audit: deset dimenzí
| Dimenze | Co odhaluje |
|---|---|
| Build | Compilation, lint, and suite integrity |
| Telemetry | PII leaks and unsafe logging payloads |
| Accessibility | Landmarks, keyboard flow, heading hierarchy |
| Security | Secrets, injection risk, dependency flaws |
| Performance | N+1 paths, unbounded loops, memory leaks |
| Documentation | Spec and implementation drift |
| Test Coverage | Behavior changes without matching tests |
| TODO Debt | Skipped follow-ups and unresolved placeholders |
| Error Handling | Swallowed errors and leaked internals |
| AI Verbosity | Redundant comments and unnecessary abstractions |
The Audit: Ten Dimensions
| Dimension | What it catches | | --- | --- | | Build | Compilation, lint, and suite integrity | | Telemetry | PII leaks and unsafe logging payloads | | Accessibility | Landmarks, keyboard flow, heading hierarchy | | Security | Secrets, injection risk, dependency flaws | | Performance | N+1 paths, unbounded loops, memory leaks | | Documentation | Spec and implementation drift | | Test Coverage | Behavior changes without matching tests | | TODO Debt | Skipped follow-ups and unresolved placeholders | | Error Handling | Swallowed errors and leaked internals | | AI Verbosity | Redundant comments and unnecessary abstractions |
:::graphic name: AuditRadarChart caption: Audit posture before and after evidence-driven checks. :::
Příklady důkazů podle domény
| Doména | Důkaz před | Důkaz po |
|---|---|---|
| API endpoint | curl response with wrong status | curl response with expected status and schema |
| Database migration | Query before migration | Query showing new columns and populated values |
| Infrastructure | Current plan output | Desired plan and apply output |
| Performance | Benchmark baseline | Benchmark delta after optimization |
| Security patch | Scanner finding | Scanner clean report |
The Burden of Proof in Pull Requests
Odkazy
- Kent Beck (2025) Augmented Coding: Beyond the Vibes
- ThoughtWorks (2025) AI-Aided Test-First Development
- METR (2025) AI Tools Made Experienced Developers 19% Slower
- Addy Osmani (2026) AI Writes Code Faster. Your Job Is Still to Prove It Works.
- Microsoft .NET (2026) Ten Months with Copilot Coding Agent in dotnet/runtime