I’m bullish on AI coding tools. I use them daily. They’ve made me meaningfully faster.
But they also generate bugs. Specific, predictable, recurring bugs. After reviewing hundreds of AI-generated code snippets (both my own and my team’s), I’ve identified patterns in what goes wrong. Understanding these patterns makes you much better at catching problems before they reach production.
Here are the 8 most common mistakes I see in AI-generated code, with real examples and workarounds.
1. The “Works in Isolation” Problem
What happens: AI generates code that works perfectly as a standalone function but breaks when integrated into your system.
Real example: I asked Claude to write a function that fetches user preferences from our API:
// What AI generated
async function getUserPreferences(userId: string): Promise<Preferences> {
const response = await fetch(`/api/users/${userId}/preferences`);
if (!response.ok) throw new Error(`Failed: ${response.status}`);
return response.json();
}
Looks fine. Except:
- Our API requires an auth token in headers (AI didn’t know this)
- We use
axioseverywhere else, notfetch(inconsistency) - Our error handling pattern wraps errors in a custom
ApiErrorclass - We have a centralized HTTP client with retry logic that this bypasses
The function works. It just doesn’t fit.
Workaround: Always provide an existing example from your codebase:
This simple change eliminates about 60% of integration issues.
2. The Optimistic Error Handling
What happens: AI generates code that handles the happy path thoroughly and error cases superficially.
# What AI generated for a file processing function
def process_file(filepath: str) -> ProcessResult:
with open(filepath) as f:
data = json.load(f)
validated = validate_schema(data)
transformed = transform_data(validated)
result = save_to_database(transformed)
return ProcessResult(success=True, records=len(result))
What’s missing:
- What if the file doesn’t exist?
- What if it’s not valid JSON?
- What if
validate_schemafails partially (some records valid, some not)? - What if
save_to_databasefails mid-batch? - What if the file is 10GB and doesn’t fit in memory?
AI tends to generate what I call “conference talk code” — it demonstrates the concept clearly but would crash in production within hours.
Workaround: Explicitly ask for pessimistic error handling:
Adding “Be paranoid. Production is hostile.” to prompts has noticeably improved the error handling AI generates. It’s a silly trick, but it works.
3. The Memory-Ignorant Code
What happens: AI writes code that’s correct but loads everything into memory, ignoring data volume.
# AI-generated log analyzer
def analyze_logs(log_dir: str) -> dict:
all_logs = []
for file in os.listdir(log_dir):
with open(os.path.join(log_dir, file)) as f:
all_logs.extend(json.loads(line) for line in f)
# Process all logs in memory
error_count = sum(1 for log in all_logs if log['level'] == 'ERROR')
avg_latency = sum(log['latency'] for log in all_logs) / len(all_logs)
# ... more aggregations
return {"errors": error_count, "avg_latency": avg_latency}
This works for 1,000 log entries. With our production logs (millions of entries per day, ~8GB per day), it’ll eat all available memory and get OOM-killed.
AI consistently underestimates data volume because it doesn’t know your scale. When you say “analyze logs,” it thinks “a few files.” You’re thinking “terabytes.”
Workaround: State the data volume explicitly:
Or, even more directly:
4. The Phantom Dependency
What happens: AI uses libraries or features that don’t exist, are deprecated, or are the wrong version.
I’ve seen AI:
- Import modules that were renamed 3 versions ago
- Use API methods that exist in the docs but were never actually implemented
- Mix APIs from different library versions
- Invent entirely plausible-sounding packages that don’t exist
# AI-generated code using a "real" library
from redis.asyncio import RedisCluster
from redis.asyncio.cluster import ClusterPipeline
# RedisCluster exists... but ClusterPipeline doesn't have
# the .watch() method AI is about to use
Workaround: Always verify imports and API usage:
And honestly, just check the import statements first when reviewing AI code. If you see an import that looks unfamiliar, verify it exists before reading the rest.
5. The Security Afterthought
What happens: AI generates functionally correct code that’s insecure.
// AI-generated user search endpoint
app.get('/api/users/search', async (req, res) => {
const { query } = req.query;
const users = await db.query(
`SELECT * FROM users WHERE name LIKE '%${query}%'` // SQL injection
);
res.json(users); // Exposes all columns, including password_hash
});
Two critical issues: SQL injection and data exposure. Both are well-known vulnerabilities. AI knows about them — if you ask “is this code secure?” it’ll flag both issues. But when generating code, it often takes the shortest path to functionality without proactively applying security best practices.
This is especially dangerous because the code works. All your tests pass. The SQL injection only shows up when someone sends a malicious query parameter.
Workaround: Separate generation from security review:
Better yet, have specific security patterns you always follow and provide them as context:
6. The Copy-Paste Architecture
What happens: AI generates code that looks like it was copied from a tutorial and lightly modified. Because it was (statistically speaking).
Signs of copy-paste architecture:
- Variable names like
data,result,temp,item(generic tutorial names) - Comment blocks that explain what the code does line by line (tutorial style)
- Patterns that make sense for a blog post but not for production (single-file apps, hardcoded config, console.log for observability)
- Architectures that map to common tutorial structures rather than your system
Workaround: Provide your actual architecture as context:
7. The Test That Tests Nothing
What happens: AI generates tests that pass but don’t actually verify meaningful behavior.
# AI-generated "test"
def test_process_payment():
processor = PaymentProcessor(mock_gateway)
result = processor.process(amount=100, currency="USD")
assert result is not None # This tells us nothing
assert isinstance(result, PaymentResult) # Still nothing useful
This test will pass whether the payment was processed, declined, or set on fire. It tests type correctness, not behavior.
I see this pattern constantly in AI-generated tests. The tests look comprehensive (high line count, many test functions), but the assertions are weak. They verify that the function returns something rather than verifying it returns the right thing.
Workaround:
8. The Concurrency Illusion
What happens: AI generates concurrent code that works under low load but has race conditions or deadlocks under production concurrency.
# AI-generated cache with "thread safety"
class Cache:
def __init__(self):
self._data = {}
self._lock = threading.Lock()
def get_or_compute(self, key: str, compute_fn):
if key in self._data: # Check without lock!
return self._data[key]
with self._lock:
# Doesn't double-check after acquiring lock
result = compute_fn()
self._data[key] = result
return result
Two bugs: the initial check is outside the lock (TOCTOU race), and there’s no double-check inside the lock (thundering herd — multiple threads might all compute the value). Both are classic concurrency bugs that AI generates regularly because the “correct” versions appear less frequently in training data than the buggy ones.
Workaround:
The Meta-Lesson
All 8 mistakes share a common root cause: AI optimizes for correctness in isolation, not correctness in context. It generates code that would work in a tutorial, a test, or a small project. Production systems are different — they have history, constraints, conventions, scale, and adversarial inputs that tutorials don’t.
Your job as the human in the loop is to supply that context. The better you are at communicating your system’s reality to the AI, the fewer of these mistakes you’ll encounter.
This doesn’t mean AI code generation isn’t valuable — it absolutely is. It means you need to review AI-generated code with specific patterns in mind rather than a general “does this look right?” If you know what to look for, you’ll catch problems that would otherwise make it to production.
You might also like
- How I Use AI to Debug Production Issues
- Building an AI-Powered Development Workflow from Scratch
- How I Use AI to Review Code 3x Faster
📦 Free: AI Code Review Prompt Pack — 10 prompts I use on 15+ PRs/week.
Newsletter: One practical AI workflow per week, plus templates I don’t publish here. Subscribe →