Skip to content
AI Field Notes

Part II · Shipping Workflows

08

AI-Powered Code Review

Using BugBot and Codex as PR reviewers - what they catch and what they miss

Before any human looks at my PR, two tools already have. One checks the diff for type errors, missing null checks, and edge cases the tests didn't cover. The other looks at whether the implementation actually matches what the PR description says it does.

Sometimes they catch real issues. Sometimes they flag things that don't matter. Either way, by the time a human sees the code, the mechanical first pass is already done.

The two tools: Cursor BugBot for PR review in Cursor workflows, and Codex on GitHub as an automatic PR reviewer.

Cursor BugBot

BugBot reviews pull requests in Cursor workflows and posts actionable review feedback. It's particularly good at:

  • Catching TypeScript type errors that the compiler misses (logic errors that are type-safe but wrong)
  • Identifying missing null checks and undefined access patterns
  • Spotting inconsistencies between the implementation and the surrounding code style
  • Flagging potential performance issues (unnecessary re-renders, missing memoization, N+1 queries)

The workflow: make your changes, open the PR, run BugBot review, fix what it finds, then request human review. By the time a human sees the code, the mechanical stuff is already resolved.

Codex on GitHub

Codex as a GitHub reviewer can run on every PR and post review comments automatically when automatic reviews are enabled. Request a review by mentioning @codex review in a PR comment, or enable automatic reviews to cover every PR. Review guidelines can be customized via an AGENTS.md file in the repository. It's useful for:

  • Checking that the implementation matches the PR description
  • Identifying missing test coverage for the changed code
  • Catching security issues (exposed secrets, SQL injection patterns, missing auth checks)
  • Verifying that error handling is consistent with the rest of the codebase

Key difference from BugBot: Codex has the full repository context, not just the changed files. So it catches things like "this function is now inconsistent with how similar functions work elsewhere in the codebase."

What AI review actually catches

The most valuable catches I've seen:

  • Off-by-one errors in pagination — the kind that only show up when you're on the last page
  • Missing edge cases in input validation — empty strings, null values, negative numbers
  • Race conditions in async code — especially in React useEffect cleanup
  • Subtle breaking changes — changing a function signature in a way that's technically backward compatible but breaks callers

What it gets wrong

AI review produces false positives. Common ones:

  • Flagging intentional patterns as bugs (not enough context to know it's intentional)
  • Suggesting refactors that aren't in scope for the PR
  • Missing bugs that require understanding the business logic (the model doesn't know what the code is supposed to do)

Think of it as a fast, cheap first pass. Catches mechanical bugs well. Not a substitute for a human reviewer who understands the product.

Workflow Recipe

Copy-pasteable flows and guided workflow finder

Pick a workflow

Prompt

I need to build [feature]. Here’s the spec: inputs, outputs, edge cases, constraints. Produce a plan before writing any code.

Steps

1

Spec

Write a developer spec with inputs, outputs, edge cases, and constraints.

2

Plan

Ask the model to produce a plan — files to create/modify, key decisions. Review before coding.

3

Code

Implement against the plan. One feature, one PR. Pull the model back if it goes out of scope.

4

Review

Ask the model to review its own code: "What edge cases might this miss? What would break this?"

5

Tests

Generate tests from the review’s edge cases: "Write tests for the edge cases you identified."

6

PR description

Generate the PR description from the spec and the diff. Full context produces clear descriptions.

Tools

CursorClaude CodeGitHub Actions

Guardrails

  • One feature, one PR — keep scope tight
  • Review the plan before writing code
  • Don’t let the model touch files outside scope
  • Ask before refactoring adjacent code

Expected output

Working PR with passing CI, clear description, and tests covering the identified edge cases.

Paste into CLAUDE.md, .cursorrules, or your workflow docs

Failure Gallery

My bruises, your benefit — real failures with exact fixes

14 failures catalogued4 critical7 high

Called functions or methods that don't exist in the version you're using.

high

next/router.prefetch() with invented arguments

Asked the model to add prefetching to a Next.js navigation component. Specified we're on Next.js 14.

What the model produced

sneaky
import { useRouter } from 'next/router';

function NavLink({ href, children }) {
  const router = useRouter();

  const handleMouseEnter = () => {
    router.prefetch(href, { priority: 'high', timeout: 3000 });
  };

  return (
    <a href={href} onMouseEnter={handleMouseEnter}>
      {children}
    </a>
  );
}
critical

Prisma cursor pagination with wrong field name

Asked for cursor-based pagination using Prisma. The model had no schema context.

What the model produced

sneaky
async function getNextPage(cursor?: string) {
  const items = await prisma.post.findMany({
    take: 20,
    cursor: cursor ? { id: cursor } : undefined,
    skip: cursor ? 1 : 0,
    orderBy: { createdAt: 'desc' },
    where: { published: true },
  });

  const nextCursor = items.length === 20
    ? items[items.length - 1].id
    : null;

  return { items, nextCursor };
}
high

React Query v5 mutation with v4 API

Asked for a mutation hook using React Query. Didn't specify the version.

What the model produced

tricky
import { useMutation, useQueryClient } from '@tanstack/react-query';

function useUpdateUser() {
  const queryClient = useQueryClient();

  return useMutation(
    (userData) => updateUserApi(userData),
    {
      onSuccess: () => {
        queryClient.invalidateQueries('users');
      },
    }
  );
}

Failures are curated from real usage. Susceptibility indicators are derived from model traits in modelSpecs.ts — not hardcoded.