Claude Code Review Checklist (2026): 12 Checks Before You Approve

Last month I merged a PR that looked clean on first pass: tests green, linter happy, Claude review comments looked thoughtful. Two days later we discovered duplicate billing events caused by a retry path that wasn’t idempotent.

That was a useful reminder: AI code review is great at coverage, but still weak at risk weighting. It can tell you 20 things; only 2 might matter.

So I stopped treating Claude Code as a “smart approver” and started using it like a structured risk scanner.

This is the checklist I now run before approval.

How I Use This Checklist

I ask Claude Code to review the diff with this exact prompt:

Review this PR diff for:
1) correctness bugs
2) security/privacy risk
3) reliability issues (timeouts/retries/idempotency)
4) maintainability issues that will hurt in 3+ months

Return only concrete findings with file + line context and severity.
Skip style nits.

Then I run the 12 checks below manually, with Claude as assistant—not decision-maker.

The 12 Checks

1) Idempotency on money/state-changing paths

If retries happen, does the same request create duplicate side effects?

Look for:

webhook handlers
billing writes
status transitions
job workers without dedup keys

If this fails, it’s usually a release blocker.

2) Timeout propagation across call chains

Most outages aren’t from one failure—they’re from hanging dependencies.

Check that timeout values are:

explicit
passed through downstream clients
aligned with retry budgets

“Default timeout” is often “no timeout.”

3) Retry safety (not just retry existence)

A retry without jitter/backoff can create thundering herds.

Check:

exponential backoff + jitter
bounded retry count
no retry on non-retryable errors

And confirm retry won’t replay unsafe writes.

4) Auth boundary leaks

Claude catches obvious auth checks, but subtle context leaks still slip through.

Verify:

tenant/org scoping on every query path
no fallback to global scope
permission checks happen before side effects

5) Error handling that preserves signal

Many PRs “handle” errors by swallowing detail.

Check:

logs preserve actionable context (request id, key identifiers)
user-facing messages are safe but informative
no broad catch that hides root cause

6) Null/optional assumptions in changed paths

AI tools are decent here, but still miss cross-file assumptions.

Check:

nullable fields from external APIs
schema mismatch between model and runtime object
optional chaining that masks broken data contracts

7) Data race / ordering risk in async flows

If two workers process same entity, what happens?

Check:

lock or CAS strategy
transaction boundaries
event ordering assumptions

If ordering matters and isn’t enforced, note it explicitly before merge.

8) Test intent vs real risk

“More tests” ≠ “better protection.”

Check that tests cover:

failure branches
replay/retry scenarios
boundary conditions tied to business impact

A perfect happy-path suite can still hide production-grade bugs.

9) Config drift and default changes

Small config edits can create large operational changes.

Check:

changed defaults
environment-specific behavior
feature flags with safe rollback

Ask: if this flag is wrong in prod, can we recover quickly?

10) Observability readiness

If this breaks at 2am, can on-call diagnose in 10 minutes?

Check:

metrics added/updated
dashboards/alerts still valid
structured logging around new critical paths

No observability = blind deploy.

11) Migration/backfill safety

Any schema or data migration should be reversible or staged.

Check:

backward compatibility window
dual-write/read strategy where needed
explicit rollback plan in PR description

12) Scope creep disguised as “cleanup”

Claude sometimes suggests broad refactors during review.

Check:

does the PR still do one thing?
are unrelated refactors increasing merge risk?
can cleanup be split into follow-up PR?

Small, focused PRs ship safer and faster.

What I Track After Merge

The checklist is only useful if you measure outcomes.

I track three metrics monthly:

false-positive rate from AI review comments
escaped defect count in AI-reviewed PRs
median review time for medium/high-risk PRs

If false positives rise, I tighten prompt and reduce tool surface. If escaped defects rise, I add explicit checklist guards.

Common Failure Pattern (and Fix)

Failure pattern:

team enables AI review
comment volume increases
reviewers feel “more covered”
critical risk still passes because signal-to-noise worsens

Fix:

use one primary AI reviewer + one fallback
enforce severity labeling (P0/P1/P2)
require risk summary in PR comment before approval

Use this PR risk summary template to standardize that step:

PR Risk Summary
- Scope: [what changed]
- Highest risk: [P0/P1/P2 + one-line scenario]
- Evidence: [file/line + test/log]
- Rollback plan: [how to revert safely]
- Decision: [approve / request changes]

AI should reduce reviewer fatigue, not cosmetically increase review output.

Final Take

Claude Code is excellent at accelerating review prep. It is not a substitute for risk ownership.

If you adopt one thing from this post: standardize a checklist that prioritizes idempotency, timeout chains, and auth boundaries over style and micro-refactors.

That alone will prevent more incidents than adding another review bot.

If this checklist style is useful, you might also like I Ran 5 AI Review Agents on the Same PR. Only 2 Were Useful and How I Use AI to Review Code 3x Faster.

How I Use This Checklist#

The 12 Checks#

1) Idempotency on money/state-changing paths#

2) Timeout propagation across call chains#

3) Retry safety (not just retry existence)#

4) Auth boundary leaks#

5) Error handling that preserves signal#

6) Null/optional assumptions in changed paths#

7) Data race / ordering risk in async flows#

8) Test intent vs real risk#

9) Config drift and default changes#

10) Observability readiness#

11) Migration/backfill safety#

12) Scope creep disguised as “cleanup”#

What I Track After Merge#

Common Failure Pattern (and Fix)#

Final Take#

How I Use This Checklist

The 12 Checks

1) Idempotency on money/state-changing paths

2) Timeout propagation across call chains

3) Retry safety (not just retry existence)

4) Auth boundary leaks

5) Error handling that preserves signal

6) Null/optional assumptions in changed paths

7) Data race / ordering risk in async flows

8) Test intent vs real risk

9) Config drift and default changes

10) Observability readiness

11) Migration/backfill safety

12) Scope creep disguised as “cleanup”

What I Track After Merge

Common Failure Pattern (and Fix)

Final Take