Six months ago, my development workflow was: think → code → test → review → deploy. AI was something I used occasionally — paste a stack trace here, generate a test there. It was a tool I reached for, not a part of my process.

Now AI is embedded in every phase. Not because I’m trying to automate myself out of a job, but because I found that each phase has specific tasks where AI genuinely helps and specific tasks where it gets in the way.

Here’s the workflow I’ve built, phase by phase.

Phase 1: Planning

What AI Does Well Here

Breaking down requirements into tasks:

H"TTB-----eUhhrrseeeTDEDTeeyaiesee'rektstpcsssxlciehhptermnnasoohiadihuriptecfoltstenaeudidclalsioitdbhnncecueotosorbuo(mneeal2posbde-lniraln3edebewgxoeqloisitruetrnethaeokenyetstetritosfre(oeoinStnfelrnc/asrxegeMsopcds/komotat)Lsrrsa)ptahsrrbkiotdossdhaa.kuetrsciedFtrso:rrdawaniesgtahehcbhoauanptrdadtswokdh:ai5tc0ahwamisedtgareitCcsSs.V"tooriPnDcFl.ude.

AI produces a decent first-pass breakdown. It catches things I might skip in an initial decomposition — like “add export format validation” or “handle timeout for large dashboards.” I’d estimate it gets about 75% of the tasks right. I always add, remove, or adjust tasks before they go into the backlog.

Identifying edge cases early:

WEW----eah'caDFUErhtaosretrerweamrobidarudgteigeoxslelcpcdtcuoeeiammrnncsepiagaeaernsatnianicohsdbesCahiSvoplVeueilrteddfyxiopfwrofemreatrpnelcfnaeetnatdfuaortrea?fsTochrhiendmkaassahbbaoonuadtr:dtsimweitghraunpultaori5t0iewsi.dgets.

This is surprisingly valuable. AI will list 15-20 edge cases, and usually 3-4 of them are things I wouldn’t have thought of until they became bugs. Last time, it flagged “widgets with different time zones will produce confusing data when merged into a single CSV” — a real issue we’d have discovered in QA.

What AI Does Poorly Here

Estimating effort. AI doesn’t know your codebase, your team’s velocity, or whether your CSV library is well-maintained or a nightmare. Its time estimates are useless. Don’t even ask.

Prioritization. “Should we build the export feature or fix the performance regression?” requires business context, stakeholder relationships, and judgment that AI simply doesn’t have.

My Planning Toolkit

  • Claude Code for requirement analysis and task breakdown
  • Cursor for quick prototyping when I need to validate feasibility
  • A plain text file for my own thinking (AI-free zone)

Phase 2: Design

What AI Does Well Here

Generating architecture options:

I-----G123i...nMEEUSveuaxsyeSAAescpesyssdthortmnyyreecnnthwtsmhccoaih3rndcohowwddgauaaniielenlnrottsetddcuhhitlhsgeqaseipWnxukest(oepeeeerlbtor~celShriup2tqioetepr0uuncsso0regketgesexfaorettpoexo-ord3spprpri0sotertdfrisoafstopgsseenorehrcrsnerbeoessvonnqwesiatdui)crsetueddshp.saftdtostaCwar/rtoihaentslodsshoaueturrorurgfapceafietsntd:toapsse:5ha0bkowairddgsets

AI lays out the options clearly, with reasonable tradeoffs. This doesn’t replace a design discussion with the team, but it gives me a structured starting point.

Writing the design document:

I covered this in detail in my post about writing technical documents with AI, but the short version: AI takes my bullet points and produces a well-structured RFC in minutes. I spend my time editing and adding context rather than staring at a blank page.

What AI Does Poorly Here

Making the actual design decision. It can’t weigh “our team has more experience with approach A” or “approach B aligns with the platform migration we’re doing next quarter.” Design decisions require context that exists in your head and your organization, not in the prompt.

Phase 3: Coding

This is the phase most people think of when they say “AI-powered development,” and it’s where the tooling is most mature.

My Coding Workflow

Starting a new file: I describe what I need in Cursor’s Composer or Claude Code and get a scaffolded implementation. Then I iterate.

claude "Create a TypeScript export service with these specs:
- ExportService class with methods: startExport, getProgress, downloadResult
- Uses Bull queue for async processing
- Redis for progress tracking
- S3 for result storage
- Include error handling and retry logic

Use our existing patterns:
[paste example service from our codebase]"

Working on existing code: I use Cursor’s inline edit (Cmd+K) for targeted changes. Highlight a function, describe the change, review the diff.

Complex logic: I use AI pair programming — think out loud with the AI, scaffold together, implement incrementally.

Boring boilerplate: This is where AI saves the most time. Type definitions, validation schemas, API endpoint handlers, database models — all of this follows patterns that AI replicates well.

The 70/30 Rule

I’ve settled on a rough rule: AI writes about 70% of the first draft, and I write (or rewrite) about 30%. That 30% is the part that matters most — the business logic, the edge case handling, the integration with existing code.

If I find the AI writing more than 70% of the final code unchanged, I get nervous. It means either the task is too simple to need AI, or I’m not reviewing carefully enough.

Code That AI Shouldn’t Write

  • Security-critical code (auth, encryption, access control) — I write this myself and use AI only for review
  • Core business logic that encodes domain knowledge — AI doesn’t know our domain
  • Performance-sensitive hot paths — AI writes correct-but-naive code that may not meet latency requirements
  • Anything I don’t understand — if I can’t review it, I can’t ship it

Phase 4: Testing

What AI Does Well Here

Generating test cases from implementation:

cat src/export-service.ts | claude "Generate comprehensive test cases for this service.
Use Jest. Include:
- Happy path tests
- Error cases (network failure, invalid input, timeout)
- Edge cases (empty dashboard, maximum widgets, concurrent exports)
- Integration test structure (mock S3 and Redis)

For each test, add a comment explaining WHAT we're testing and WHY."

AI-generated tests are usually 80% usable. The main issues:

  • Tests that test implementation details rather than behavior
  • Missing domain-specific edge cases
  • Overly optimistic mocks that don’t reflect real failure modes

Property-based test generation:

GP----ernoRASEepolpmreuleparnctttdriyeitoaerwlvpsisarpclot:hhupoaaeefvrsrtoeatercaysmttr-taheeb:terashssaea(ndpmcdaeoltrmeesnmdseuatmscssb,ohen(orqsuuuisloosidfttneegecsnqo,tfullaaunyslmetnw-oslcrihinegeciskn))alafroderatptarhoepeCrSlVyfeosrcmaaptetder.

This is one area where AI consistently outperforms my own test writing. It generates more thorough property tests than I would manually.

What AI Does Poorly Here

Knowing what to test. AI will test every function with equal thoroughness. It doesn’t know that formatCurrency is used in billing and needs exhaustive tests while formatLogMessage is used in debugging and needs basic coverage. You still need to prioritize.

Phase 5: Code Review

I’ve written about this in detail in how I use AI for code review, so I’ll just summarize: AI runs a bug scan and style check before I do my human review. The three-pass approach (AI bug scan → AI style check → human architecture review) has cut my review time roughly in half.

Phase 6: Deploy & Monitor

Pre-Deploy Checklist

claude "Generate a pre-deploy checklist for this diff:
$(git diff main..feature-branch)

Consider:
- Database migrations needed?
- Environment variables to add?
- Feature flags to set?
- Services to restart?
- Monitoring to add?
- Rollback procedure?"

This has caught missing environment variables twice and a forgotten database migration once. Worth the 30 seconds it takes to run.

Post-Deploy Log Analysis

After deploying, I watch logs for 15-20 minutes. If anything looks off:

kubectl logs deployment/export-service --since=15m | tail -200 | \
claude "Analyze these logs from a freshly deployed service.
Flag anything that looks like:
- Errors or warnings
- Performance degradation
- Unexpected behavior
- Missing expected log lines (e.g., startup, health checks)"

Incident Response

When things go wrong, AI helps with rapid diagnosis. I covered this in depth in how I debug production issues with AI.

The Full Workflow in Practice

Here’s what a typical feature development cycle looks like with this workflow:

PhaseTime Without AITime With AIAI Contribution
Planning2-3 hours1-1.5 hoursTask breakdown, edge cases
Design3-4 hours1.5-2 hoursRFC drafting, options analysis
Coding8-16 hours5-10 hoursScaffolding, boilerplate, iteration
Testing3-5 hours1.5-3 hoursTest generation, edge cases
Review1-2 hours30-60 minBug scan, style check
Deploy30 min30 minPre-deploy checklist

Total: roughly 35-45% time reduction on a typical feature. The variance is high — simple CRUD features see bigger savings, complex distributed systems features see less.

What I’d Do Differently

If I were building this workflow from scratch today:

  1. Start with code review. It has the highest ROI and lowest risk. You’re using AI to find problems, not create them.
  2. Add testing next. Test generation is the second-best use case. AI-generated tests catch real bugs.
  3. Then documentation. Huge time saver, low downside.
  4. Coding assistance last. This is where the biggest time savings are, but also where the biggest risks are. You need enough experience with AI output to review it critically.

Don’t try to adopt everything at once. Pick one phase, build the habit, and expand from there.

The Honest Assessment

Is this workflow perfect? No. Here’s what still bothers me:

  • Dependency on AI availability. When Claude or OpenAI has an outage, my workflow slows down noticeably. I’ve built a dependency I can’t easily unwind.
  • Review fatigue. Reading AI-generated code is tiring in a different way than writing code. You’re scanning for subtle errors rather than building understanding. After a long session, my review quality drops.
  • The learning question. Am I learning as much as I would writing everything from scratch? Probably not. I try to compensate by understanding every line of AI-generated code, but I’m honestly not sure it’s equivalent.
  • Cost. I’m spending ~$60/month on AI tools (Cursor + Claude Pro). Worth it for me, but it adds up across a team.

Despite all that, I’m not going back. The productivity gain is real and consistent. The key is staying in the driver’s seat — you direct the AI, not the other way around.


You might also like


📦 Free: AI Code Review Prompt Pack — 10 prompts I use on 15+ PRs/week.

Newsletter: One practical AI workflow per week, plus templates I don’t publish here. Subscribe →