Testing & QA

AI Code Review: What Human Reviewers Should Look For

Clarvia Team
Author
May 22, 2025
6 min read
AI Code Review: What Human Reviewers Should Look For

Last month, an AI-generated function passed every test and looked flawless. It also called an API method that doesn't exist.

The code compiled. The types checked out. The logic read beautifully. But stripe.customers.archiveBatch() is a hallucination -- Stripe has never offered that endpoint. A human reviewer caught it in 30 seconds. Without that review, it would have crashed in production at 2 AM on a Saturday.

AI-generated code demands a fundamentally different review mindset. You're not checking for typos anymore. You're hunting for plausible lies.

Why AI Code Review Is Different

Humans and AI fail in opposite directions. Understanding this pattern is worth more than any checklist:

Humans tend to:

  • Make typos and syntax errors (easy to catch)
  • Forget edge cases they didn't think of
  • Write inconsistent code when tired at 11 PM
  • Take shortcuts under deadline pressure
  • AI tends to:

  • Generate syntactically perfect but semantically wrong code
  • Miss project-specific conventions it was never taught
  • Over-engineer simple problems into enterprise-grade abstractions
  • Hallucinate APIs, methods, or patterns that don't exist anywhere
  • Handle the happy path beautifully while ignoring every edge case
  • The scary part: AI mistakes look more professional than human mistakes. That's what makes them dangerous.

    The AI Code Review Checklist

    1. Verify It Actually Works

    Never trust code you haven't executed. AI generates plausible-looking code that fails silently in 12-15% of cases:

    • Run the code. Don't just read it -- execute it with real inputs.
    • Test the happy path. Does the basic flow complete without errors?
    • Test edge cases. Empty inputs, nulls, boundary values, Unicode strings.
    • Check error scenarios. Kill the database connection mid-request. What happens?

    Looking correct and being correct are two different things entirely.

    2. Check for Hallucinations

    This is the AI-specific threat that has no equivalent in human code review. AI will confidently reference packages, methods, and patterns that have never existed:

    • Verify imports exist. Is that npm package real? Has it been updated since 2023?
    • Check method signatures. Does response.data.flattenDeep() actually exist on that object?
    • Validate library usage. Cross-reference against the actual docs, not your memory.
    • Watch for version mismatches. AI often generates code for older API versions.

    Hallucination rates spike with niche libraries. The more obscure the dependency, the more you verify.

    3. Evaluate Security

    AI may not prioritize security without explicit prompting. Check for:

    • Input validation: Is user input sanitized?
    • SQL injection: Are queries parameterized?
    • XSS vulnerabilities: Is output properly escaped?
    • Authentication/authorization: Are access checks present?
    • Sensitive data handling: Is PII properly protected?
    • Hardcoded secrets: Any API keys or passwords in code?

    For more on automated security scanning, see The AI Security Audit Process.

    4. Assess Performance

    AI optimizes for correctness, not always for performance:

    • Query efficiency: N+1 queries? Missing indexes?
    • Memory usage: Loading everything into memory?
    • Algorithmic complexity: O(n²) when O(n) is possible?
    • Caching opportunities: Repeated expensive operations?
    • Async handling: Blocking operations that should be async?

    Don't optimize prematurely, but catch obvious performance issues.

    5. Verify Project Conventions

    AI learned to code from millions of repositories. Yours wasn't special to it:

    • Naming conventions: Does it match your project's camelCase or snake_case standard?
    • File structure: Is the new file in the right directory, following your module layout?
    • Architecture patterns: Does it respect your hexagonal/clean/MVC boundaries?
    • Error handling: Does it use your custom AppError class or generic Error?
    • Logging conventions: Are logs structured, tagged, and leveled like your existing code?

    6. Check Business Logic

    AI has zero business context. This is where you earn your salary:

    • Requirements match: Does this implement what the ticket actually requested?
    • Domain rules: A 90-day return window isn't the same as 3 months. Are the rules precise?
    • Edge cases: What happens to a subscription if payment fails on February 29th?
    • User experience: Does the behavior make sense to someone who isn't a developer?

    No AI model will ever understand your business better than you do. That's your moat.

    7. Evaluate Maintainability

    Consider the future developer (often yourself) who will maintain this:

    • Readability: Can you understand it at a glance?
    • Complexity: Is it more complex than needed?
    • Comments: Are tricky parts explained?
    • Testing: Is it testable? Are tests provided?
    • Documentation: For public APIs, is there documentation?

    Common AI Code Patterns to Watch

    The Over-Engineered Solution

    AI loves abstraction the way a new grad loves design patterns. You asked for a function; it gave you a factory-builder-strategy-observer:

    Watch for:

  • Factory patterns for single implementations
  • 4 layers of abstraction for a CRUD endpoint
  • Configuration systems for values that will never change
  • Generic solutions for problems that only exist once
  • Fix by: Asking yourself one question: "Could a junior dev understand this in 5 minutes?"

    The Generic Naming

    AI defaults to naming things like a robot that's never attended a standup:

    Common culprits:

  • data, result, item, value
  • handleSubmit, processData, doSomething
  • Utils, Helper, Manager
  • Fix by: Renaming to be specific: userProfile, validationResult, orderLineItem.

    The Incomplete Error Handling

    AI handles the happy path thoroughly but may skimp on errors:

    Watch for:

  • Empty catch blocks
  • Generic error messages
  • Missing validation
  • Unchecked null/undefined
  • Fix by: Adding specific error handling and validation.

    The Subtle Type Error

    In TypeScript/typed languages, AI sometimes uses any or overly broad types:

    Watch for:

  • any types that should be specific
  • Missing null checks
  • Incorrect generic constraints
  • Type assertions that hide problems
  • Fix by: Tightening types and adding proper null handling.

    Review Workflow Recommendations

    1. Read the Prompt First

    Understanding what was asked helps you evaluate if the code meets requirements. Many issues stem from ambiguous prompts, not AI limitations.

    2. Run Before Reading

    Execute the code first. If it doesn't work, you've found the issue without wasting time on detailed review.

    3. Focus on the Unusual

    AI handles common patterns well. Spend review time on:

  • Project-specific logic
  • Security-sensitive code
  • Performance-critical paths
  • Integration points
  • 4. Use Automated Tools

    Let machines check what machines check well:

  • Linters for style
  • Type checkers for types
  • Security scanners for vulnerabilities
  • Test coverage for testing
  • 5. Document Patterns

    When you find recurring issues, document them. This improves future prompts and helps the team learn.

    When to Regenerate vs. Fix

    The 60-second rule: if you can't fix it in 60 seconds, regenerate with a better prompt.

    Regenerate when:

  • The approach is fundamentally wrong (wrong algorithm, wrong architecture)
  • Major structural changes are needed across 3+ functions
  • The code missed the main requirement entirely
  • You now know exactly what prompt would produce better output
  • Fix manually when:

  • Issues are minor: a wrong variable name, a missing null check
  • The approach is right but 2-3 details need adjustment
  • Regenerating would lose 30+ lines of good work
  • The fix is obvious and takes under a minute
  • Building Review Skills

    AI code review is a skill. Like any skill, it improves with deliberate practice:

    1. Start conservative: Review everything line-by-line for the first 2 weeks
    2. Track patterns: Keep a log of recurring AI mistakes -- you'll find 80% fall into 5 categories
    3. Improve prompts: Every review finding is feedback for better prompts next time
    4. Calibrate trust: After 50+ reviews, you'll know exactly where AI excels and where it lies
    5. Share learnings: A team that shares review patterns improves 3x faster than individuals

    Frequently Asked Questions

    How much time should AI code review take?

    Typically 10-30% of the time it took to generate, depending on complexity. Simple CRUD: 5 minutes. Complex business logic: 30+ minutes.

    Should junior developers review AI code?

    Yes, with appropriate support. AI code review is an excellent learning opportunity -- they see many patterns quickly. Pair them with seniors for complex reviews.

    How do we handle disagreements about AI code quality?

    Same as traditional code review: discuss the tradeoffs, consider the context, and reach consensus. The code's origin doesn't change review principles.

    What if the AI keeps making the same mistake?

    Improve the prompt. Add explicit constraints. Provide example patterns. If it's a project-specific convention, document it in your prompts or system context.

    Further Reading

    Contact us if you'd like help building AI code review processes that catch hallucinations before your users do.

    AI code reviewreviewing AI codeAI code qualitycode review checklist

    Ready to Transform Your Development?

    Let's discuss how AI-first development can accelerate your next project.

    Book a Consultation

    Cookie Preferences

    We use cookies to enhance your experience. By continuing, you agree to our use of cookies.