Last month, an AI-generated function passed every test and looked flawless. It also called an API method that doesn't exist.
The code compiled. The types checked out. The logic read beautifully. But stripe.customers.archiveBatch() is a hallucination -- Stripe has never offered that endpoint. A human reviewer caught it in 30 seconds. Without that review, it would have crashed in production at 2 AM on a Saturday.
AI-generated code demands a fundamentally different review mindset. You're not checking for typos anymore. You're hunting for plausible lies.
Why AI Code Review Is Different
Humans and AI fail in opposite directions. Understanding this pattern is worth more than any checklist:
Humans tend to:
AI tends to:
The scary part: AI mistakes look more professional than human mistakes. That's what makes them dangerous.
The AI Code Review Checklist
1. Verify It Actually Works
Never trust code you haven't executed. AI generates plausible-looking code that fails silently in 12-15% of cases:
- •Run the code. Don't just read it -- execute it with real inputs.
- •Test the happy path. Does the basic flow complete without errors?
- •Test edge cases. Empty inputs, nulls, boundary values, Unicode strings.
- •Check error scenarios. Kill the database connection mid-request. What happens?
Looking correct and being correct are two different things entirely.
2. Check for Hallucinations
This is the AI-specific threat that has no equivalent in human code review. AI will confidently reference packages, methods, and patterns that have never existed:
- •Verify imports exist. Is that npm package real? Has it been updated since 2023?
- •Check method signatures. Does
response.data.flattenDeep()actually exist on that object? - •Validate library usage. Cross-reference against the actual docs, not your memory.
- •Watch for version mismatches. AI often generates code for older API versions.
Hallucination rates spike with niche libraries. The more obscure the dependency, the more you verify.
3. Evaluate Security
AI may not prioritize security without explicit prompting. Check for:
- •Input validation: Is user input sanitized?
- •SQL injection: Are queries parameterized?
- •XSS vulnerabilities: Is output properly escaped?
- •Authentication/authorization: Are access checks present?
- •Sensitive data handling: Is PII properly protected?
- •Hardcoded secrets: Any API keys or passwords in code?
For more on automated security scanning, see The AI Security Audit Process.
4. Assess Performance
AI optimizes for correctness, not always for performance:
- •Query efficiency: N+1 queries? Missing indexes?
- •Memory usage: Loading everything into memory?
- •Algorithmic complexity: O(n²) when O(n) is possible?
- •Caching opportunities: Repeated expensive operations?
- •Async handling: Blocking operations that should be async?
Don't optimize prematurely, but catch obvious performance issues.
5. Verify Project Conventions
AI learned to code from millions of repositories. Yours wasn't special to it:
- •Naming conventions: Does it match your project's
camelCaseorsnake_casestandard? - •File structure: Is the new file in the right directory, following your module layout?
- •Architecture patterns: Does it respect your hexagonal/clean/MVC boundaries?
- •Error handling: Does it use your custom
AppErrorclass or genericError? - •Logging conventions: Are logs structured, tagged, and leveled like your existing code?
6. Check Business Logic
AI has zero business context. This is where you earn your salary:
- •Requirements match: Does this implement what the ticket actually requested?
- •Domain rules: A 90-day return window isn't the same as 3 months. Are the rules precise?
- •Edge cases: What happens to a subscription if payment fails on February 29th?
- •User experience: Does the behavior make sense to someone who isn't a developer?
No AI model will ever understand your business better than you do. That's your moat.
7. Evaluate Maintainability
Consider the future developer (often yourself) who will maintain this:
- •Readability: Can you understand it at a glance?
- •Complexity: Is it more complex than needed?
- •Comments: Are tricky parts explained?
- •Testing: Is it testable? Are tests provided?
- •Documentation: For public APIs, is there documentation?
Common AI Code Patterns to Watch
The Over-Engineered Solution
AI loves abstraction the way a new grad loves design patterns. You asked for a function; it gave you a factory-builder-strategy-observer:
Watch for:
Fix by: Asking yourself one question: "Could a junior dev understand this in 5 minutes?"
The Generic Naming
AI defaults to naming things like a robot that's never attended a standup:
Common culprits:
data, result, item, valuehandleSubmit, processData, doSomethingUtils, Helper, ManagerFix by: Renaming to be specific: userProfile, validationResult, orderLineItem.
The Incomplete Error Handling
AI handles the happy path thoroughly but may skimp on errors:
Watch for:
Fix by: Adding specific error handling and validation.
The Subtle Type Error
In TypeScript/typed languages, AI sometimes uses any or overly broad types:
Watch for:
any types that should be specificFix by: Tightening types and adding proper null handling.
Review Workflow Recommendations
1. Read the Prompt First
Understanding what was asked helps you evaluate if the code meets requirements. Many issues stem from ambiguous prompts, not AI limitations.
2. Run Before Reading
Execute the code first. If it doesn't work, you've found the issue without wasting time on detailed review.
3. Focus on the Unusual
AI handles common patterns well. Spend review time on:
4. Use Automated Tools
Let machines check what machines check well:
5. Document Patterns
When you find recurring issues, document them. This improves future prompts and helps the team learn.
When to Regenerate vs. Fix
The 60-second rule: if you can't fix it in 60 seconds, regenerate with a better prompt.
Regenerate when:
Fix manually when:
Building Review Skills
AI code review is a skill. Like any skill, it improves with deliberate practice:
- Start conservative: Review everything line-by-line for the first 2 weeks
- Track patterns: Keep a log of recurring AI mistakes -- you'll find 80% fall into 5 categories
- Improve prompts: Every review finding is feedback for better prompts next time
- Calibrate trust: After 50+ reviews, you'll know exactly where AI excels and where it lies
- Share learnings: A team that shares review patterns improves 3x faster than individuals
Frequently Asked Questions
How much time should AI code review take?
Typically 10-30% of the time it took to generate, depending on complexity. Simple CRUD: 5 minutes. Complex business logic: 30+ minutes.
Should junior developers review AI code?
Yes, with appropriate support. AI code review is an excellent learning opportunity -- they see many patterns quickly. Pair them with seniors for complex reviews.
How do we handle disagreements about AI code quality?
Same as traditional code review: discuss the tradeoffs, consider the context, and reach consensus. The code's origin doesn't change review principles.
What if the AI keeps making the same mistake?
Improve the prompt. Add explicit constraints. Provide example patterns. If it's a project-specific convention, document it in your prompts or system context.
Further Reading
- •How AI Writes Clean, Maintainable Code
- •AI Testing: How We Achieve 90% Faster QA Cycles
- •The AI Security Audit Process
Contact us if you'd like help building AI code review processes that catch hallucinations before your users do.
