Code review is one of those tasks that’s critically important but easy to fall behind on. I review something like 15-20 PRs a week across a few teams, and for a long time my approach was just “read harder, read faster.” It wasn’t working great.

Here’s how I’ve been using AI to meaningfully speed up my reviews — not a silver bullet, but a real improvement.

The Problem With Traditional Code Review

Most engineers review code the same way:

  1. Open the PR
  2. Read every changed file top to bottom
  3. Try to keep the full context in your head
  4. Leave comments
  5. Repeat

This is slow because you’re reading linearly through something that isn’t linear, you’re trying to spot bugs, style issues, architecture problems, and performance concerns all in one pass, and context switching between files erodes your working memory. By the time you hit your fifth PR of the day, you’re catching half of what you caught in the first.

The AI-Assisted Review Workflow

I now review in three passes, using an LLM for the first two. The examples below use claude (Claude Code’s CLI), but the same approach works with any LLM CLI tool — sgpt, llm, Copilot Chat, or even piping diffs into a ChatGPT window.

Pass 1: AI Bug Scan (a few minutes)

git diff main..feature-branch | claude "Scan this diff for:
1. Potential bugs (null checks, off-by-one, race conditions)
2. Security issues (injection, auth bypass, data exposure)
3. Error handling gaps (unhandled exceptions, missing retries)

Format: List each issue with file, line, severity (HIGH/MED/LOW), and explanation."

This catches the kind of mechanical bugs that humans frequently miss — especially in large PRs where your eyes start glazing over after a few hundred lines.

In my experience, AI is good at spotting missing null/undefined checks, unclosed resources (files, connections, streams), common injection and XSS patterns, inconsistent error handling, and potential race conditions in concurrent code.

Where it consistently falls short: business logic errors (it doesn’t know your domain), subtle architectural problems, and performance issues that require system-level context. I’ve learned not to expect much from it on those fronts.

Pass 2: AI Style & Pattern Check (a few minutes)

git diff main..feature-branch | claude "Review this code for:
1. Deviations from common Python/TypeScript patterns
2. Unnecessary complexity (could be simpler)
3. Missing or misleading comments
4. Naming issues (unclear variable/function names)

Don't flag minor style preferences. Only flag things that affect readability or maintainability."

That last instruction — “Don’t flag minor style preferences” — is important. Without it, the LLM will nitpick every formatting choice and the output becomes noise.

Pass 3: Human Architecture Review (10-15 minutes)

Now I read the PR myself, but I’m focused on the things AI can’t really help with: Does this approach make sense? Does it fit with the rest of the system? Is this the right level of abstraction, or is someone over-engineering (or under-engineering) the solution? Will it hold up at production scale? Will the next person who touches this code understand it?

This is the part that actually requires experience and judgment. By getting the mechanical checks out of the way first, I find I can give this part more focused attention instead of splitting my brain between “is there a null check missing” and “is this the right architecture.”

Real Example

Here’s a real (anonymized) review I did last week. The PR added retry logic to external API calls — 847 lines changed across 12 files, so not small.

The AI bug scan flagged a few things I probably would have caught eventually, but not quickly: the retry loop used a fixed delay instead of exponential backoff, the HTTP client had no timeout set (so retries could hang forever), the exception handler was catching bare Exception instead of requests.RequestException, and the retry count was a magic number that should have been in config.

The style/pattern pass caught that the main function (do_api_call_with_retry) was 67 lines long and could use extraction, that a variable r should be response, and that the docstring said “retries 3 times” while the code actually retried 5 times. That last one is the kind of thing I’d almost certainly have missed reading through manually.

What I added on my own pass was higher-level: this retry logic was duplicated across 3 services and should be a shared library, a circuit breaker pattern would be better here to prevent cascade failures, and there were no metrics or alerting on retries — we’d be blind in production.

Total time was around 18 minutes. Without the AI passes, this kind of PR usually takes me 35-40 minutes to review thoroughly, so it was a genuine time saver — though I wouldn’t call it life-changing.

Tips for Better AI Code Reviews

1. Be Specific About What to Look For

Bad:

"Reviewthiscode"

Good:

"ReviewthisPythoncodefor:bugs,securityissues,anderrorhandlinggaps.Thisisapaymentprocessingservice,sosecurityiscritical."

2. Provide Context

# Include the full file for context, not just the diff
cat src/payment_service.py | claude "Review this file. Recent changes add
Stripe webhook handling (lines 45-120). Focus your review there but flag
issues anywhere."

Again, any LLM CLI works here — the key is giving context, not the specific tool.

3. Use Checklists for Consistency

claude "Review this diff against this checklist:
- [ ] All new functions have docstrings
- [ ] All API endpoints validate input
- [ ] All database queries use parameterized statements
- [ ] All errors are logged with context
- [ ] No secrets or credentials in code
- [ ] Tests cover the happy path and at least one error case"

4. Don’t Trust AI Blindly

AI will sometimes flag correct code as buggy, miss domain-specific issues entirely, or suggest “improvements” that are actually worse. This happens regularly enough that you should always verify AI findings before commenting on the PR. Nothing erodes trust faster than leaving a comment that turns out to be wrong because you didn’t double-check what the LLM told you.

The Numbers

After roughly 3 months of this workflow, here’s what I’m seeing (these are ballpark figures, not precise measurements):

MetricBefore AIAfter AIChange
Avg review time~40 min~15-20 min~50-60% faster
Bugs caught pre-merge2-3/week5-6/weekroughly 2x
Review turnaroundnext daysame morningmuch faster
False positives from AIN/A~2-3/weekworth filtering

The biggest win honestly isn’t even the time per review — it’s the turnaround. PRs used to sit until I had a long enough block to really dig in. Now the AI passes give me a head start, so I can knock out a first pass much sooner and come back for the architecture review when I have time. That unblocking effect compounds across the team.

What I Don’t Use AI For

I should be clear about the boundaries I’ve set for myself:

  • I don’t let AI approve PRs. It assists my review — it doesn’t replace my judgment.
  • I write all review comments myself. Pasting in AI-generated comments feels impersonal and I think it would erode trust on the team.
  • I’m extra careful reviewing AI-generated code. Using AI to review AI-written code feels circular, and in practice it misses different things than a human would. These PRs get more of my time, not less.

This workflow has been a genuine improvement for me, but I want to be honest: it’s not magic. You still need to actually understand the code. AI just helps you get through the mechanical parts faster so you can focus your attention where it matters.


You might also like


📦 Free: AI Code Review Prompt Pack — 10 prompts I use on 15+ PRs/week.

Newsletter: One practical AI workflow per week, plus templates I don’t publish here. Subscribe →