The migration looked clean. My engineer ran it in dev — fast, no errors. Ran it in staging — same. We reviewed the SQL, it made sense. We scheduled the prod deployment for Tuesday morning.
Two hours before the deployment window, I was doing a final review and asked a simple question: how big is that table in production?
Two hundred million rows.
The migration added a column with a NOT NULL constraint and no default. On Postgres, that meant a full table rewrite. At our data volume, that meant an exclusive lock for somewhere between 45 minutes and two hours during business hours. The AI had generated a migration that was technically correct and would have taken down a production system.
Table of Contents
- What AI Gets Right About Migrations
- The Table Lock Problem
- Missing Rollback Strategy
- Wrong Column Defaults
- The Dev/Prod Gap AI Can’t Bridge
- The Honest Tradeoffs
- A Migration Review Checklist
- What I Actually Use AI For Now
What AI Gets Right About Migrations
Before the critique: AI is genuinely useful for generating migration boilerplate. The syntax for adding a column, creating an index, renaming a constraint — all of that is tedious to look up and easy to get wrong. AI gets it right quickly.
For small tables and simple schemas, AI-generated migrations are fine. The problem is that “small” and “simple” are relative to the system, not to the migration, and AI doesn’t know your system.
The Table Lock Problem
When you add a column with a NOT NULL constraint and no default in Postgres, the database has to rewrite every existing row to include the new value. On a large table, this acquires an ACCESS EXCLUSIVE lock — nothing can read or write to the table while the rewrite happens.
The correct pattern for zero-downtime migrations on large tables is well-documented:
- Add the column as nullable first (fast, no lock)
- Backfill the value in batches (slow, no lock)
- Set the default
- Add the NOT NULL constraint (Postgres 12+ can do this without a rewrite if all values are present)
Our AI-generated migration did step 1 and step 4 in a single statement. Clean SQL. Completely correct for a dev database with 10,000 rows. A potential outage on a production database with 200 million.
The AI had no idea about our table sizes because we hadn’t told it. And even if we had, that knowledge needs to translate into migration strategy, not just syntax — a subtler judgment call that AI handles inconsistently.
Missing Rollback Strategy
The second issue: the migration had no rollback.
In Rails or Alembic or similar frameworks, a migration with a proper down method lets you roll back if something goes wrong. AI often generates the up direction and leaves the down as a comment, a stub, or a raise statement that says “this migration is irreversible.”
Some migrations genuinely are irreversible. But many aren’t, and a missing rollback isn’t a declaration of irreversibility — it’s just an incomplete migration. If you deploy and hit a problem, you want options.
The AI-generated migration we reviewed had no rollback path. Nobody caught it because the up looked fine and we were focused on the forward path.
Wrong Column Defaults
Third issue, less catastrophic but still a bug: the default value was wrong.
We were adding a status column. The AI picked NULL as the default because the column was initially defined as nullable. The correct default for new rows was 'pending' — a business logic requirement that wasn’t expressed anywhere in the schema.
The AI had no access to the product spec, the existing application logic, or the conversation we’d had three months ago about how this status field should work. It generated the safest syntactic default, which was wrong for our use case.
This is a category of error that’s easy to miss in review because the migration runs fine. The wrong-default bug shows up later, when you query for rows in 'pending' state and get fewer than expected, and you spend an afternoon figuring out why.
The Dev/Prod Gap AI Can’t Bridge
All three of these problems have the same root cause: AI doesn’t know the difference between your dev environment and your production environment.
Dev databases are small. Prod databases are large. Dev doesn’t have load. Prod does. Dev can tolerate a 30-second lock. Prod can’t. AI generates migrations that work in dev because that’s the environment where “it works” is easy to achieve.
This isn’t a solvable problem with better prompting. You could tell Claude “our production table has hundreds of millions of rows” and it might generate a safer migration. But you have to know to say that. You have to know all the things that differ between your environments that matter for this specific migration. That’s exactly the knowledge that’s easy to forget to include.
I’ve seen similar issues documented when using AI for other production-facing tasks — the pattern shows up in AI debugging in production and AI incident response as well.
The Honest Tradeoffs
I’m not arguing that AI has no place near database migrations. The tradeoff is:
AI saves time on syntax. Writing migration SQL by hand is slow. AI is fast. That speed is real and worth something.
AI cannot substitute for production awareness. The engineer running the migration needs to know the table sizes, the lock behavior of the specific database version, the deployment window constraints, and the rollback plan. None of that comes from AI.
The review bar needs to be higher. When an engineer writes a migration themselves, there’s implicit accountability — they thought through it. When AI generates it, there’s a temptation to review it less critically because it “looks right.” That’s the opposite of the right behavior.
The risk is proportional to table size and query volume. For a new table with no data, AI-generated migrations are low-risk. For existing tables with significant traffic, treat every migration as potentially dangerous and review it as such.
A Migration Review Checklist
Before approving any migration to production:
- What’s the table size in production? Look it up. Don’t estimate.
- Does this operation acquire a lock? For how long?
- Is there a zero-downtime alternative? (nullable first, batch backfill, etc.)
- Is there a rollback path?
- Are the defaults correct for existing rows, not just new rows?
- Has this been tested against a production-scale data snapshot?
- What’s the deployment window? Can we tolerate downtime in that window?
If you can’t answer every item on this list, the migration isn’t ready.
What I Actually Use AI For Now
I still use AI in the migration workflow, just differently:
- Generating the first draft of the migration SQL — syntax, column types, constraint names
- Asking “what are the locking implications of this operation on Postgres 15?” — AI is good at explaining database behavior in general
- Reviewing the migration for obvious issues — not as a substitute for my own review, but as a second pass
I don’t use AI to make judgment calls about production safety. That stays with the engineer who knows the system.
The Tuesday near-miss cost us nothing except a delayed deployment and an uncomfortable conversation. The lesson was cheap. Not all of them are.