Core Loop

AI-first engineering at scale

Тема

AI-Pershi posmertnij analiz ta pasyvne navchannya

Chomu komandy shtuchnoho intelektu prodovzhuyut povtoryuvaty odny i ti sami pomylky

Daniel Leblond April 2026

Most AI incidents do not come from one catastrophic prompt. They come from a chain of small misses that nobody writes down while they are happening.

Passive learning is the missing muscle. You only get it when every incident leaves behind artifacts that are easy to find, compare, and reuse.

Five Whys outputs split into repair actions and organizational learning signals: what is fixed immediately vs. what becomes guardrail policy.

Five Elements of a Useful AI-Era Postmortem

Element What to Capture Why It Matters
Trigger The exact user path, prompt, or commit that exposed the issue Removes hindsight storytelling
Observed behavior Logs, traces, screenshots, and failing checks Prevents memory drift
Decision record What was considered, rejected, and accepted Makes tradeoffs visible
Remediation evidence Proof that the selected fix works in conditions that failed before Stops 'fixed in theory' claims
Guardrail update New test, lint rule, runbook step, or policy gate Converts one-time pain into repeatable prevention
Automated Postmortem Handling Lifecycle: trigger collection, evidence capture, pattern classification, and automated prevention deployment.

Passive Learning Is a System, Not a Meeting

The phrase 'we learned from this' is only true when the learning survives personnel change and time.

A practical passive-learning loop: capture a timeline, snapshot failing and corrected states side by side, classify the failure pattern, attach one mandatory prevention mechanism, and verify that mechanism in the next similar change.

Autonomy decision matrix: when to automate incident response, when to require human review, and when to escalate for policy updates.

What Mature AI Postmortems Look Like

Mature teams treat incident evidence as a first-class artifact, not a cleanup task. They distinguish model error from human process error. They promote recurring failures into automated gates quickly.

When one postmortem element is missing, the postmortem becomes historical fiction. Recurring patterns below show up repeatedly across teams and cloud providers.

Pattern Symptom Root Cause Strong Countermeasure
Prompt scope leak AI changes files outside intended boundaries Loose task framing and weak review surface Scoped diff checks and explicit file allowlists
False green tests CI passes but behavior is wrong Assertions test implementation details, not outcomes Contract-level assertions and fail-first checks
Unsafe fallback logic Silent fallback hides errors 'Keep running' branches without observability Structured error budgets and mandatory telemetry
Drift after merge Codebase quality regresses days later Fix merged without policy or docs synchronization Post-merge verification plus docs gate
Passive learning scorecard: measuring how well postmortem artifacts survive, how findable they are, and how often they prevent recurrence.

Build a Postmortem Library Developers Actually Use

If finding prior incidents takes longer than recreating the bug, nobody will consult the archive.

A usable library supports search by failure pattern, short 'what to copy' sections with ready-to-use checks, links from runbooks and PR templates, and a closure condition that confirms prevention landed in tooling.

A Practical Starting Kit for Teams

  • A postmortem template that requires evidence links.
  • A taxonomy with fewer than 12 failure patterns.
  • A policy that each incident must produce one prevention action.
  • A monthly scan for repeated pattern frequency.
  • A lightweight quality review to retire stale lessons.
Повернутися на головну

Посилання

  1. Qodo (2025) State of AI Code Quality in 2025
  2. METR (2025) AI Tools Made Experienced Developers 19% Slower
  3. Martin Fowler / Kief Morris (2025) How Far Can We Push AI Autonomy in Code Generation?
  4. Simon Willison (2025) Agentic Engineering Patterns
  5. Addy Osmani (2026) AI Writes Code Faster. Your Job Is Still to Prove It Works.
  6. Microsoft .NET Team (2026) Ten Months with Copilot Coding Agent in dotnet/runtime