Part II · Shipping Workflows
08
AI-Powered Code Review
Using BugBot and Codex as PR reviewers - what they catch and what they miss
Before any human looks at my PR, two tools already have. One checks the diff for type errors, missing null checks, and edge cases the tests didn't cover. The other looks at whether the implementation actually matches what the PR description says it does.
Sometimes they catch real issues. Sometimes they flag things that don't matter. Either way, by the time a human sees the code, the mechanical first pass is already done.
The two tools: Cursor BugBot for PR review in Cursor workflows, and Codex on GitHub as an automatic PR reviewer.
Cursor BugBot
BugBot reviews pull requests in Cursor workflows and posts actionable review feedback. It's particularly good at:
- Catching TypeScript type errors that the compiler misses (logic errors that are type-safe but wrong)
- Identifying missing null checks and undefined access patterns
- Spotting inconsistencies between the implementation and the surrounding code style
- Flagging potential performance issues (unnecessary re-renders, missing memoization, N+1 queries)
The workflow: make your changes, open the PR, run BugBot review, fix what it finds, then request human review. By the time a human sees the code, the mechanical stuff is already resolved.
Codex on GitHub
Codex as a GitHub reviewer can run on every PR and post review comments automatically when automatic reviews are enabled. Request a review by mentioning @codex review in a PR comment, or enable automatic reviews to cover every PR. Review guidelines can be customized via an AGENTS.md file in the repository. It's useful for:
- Checking that the implementation matches the PR description
- Identifying missing test coverage for the changed code
- Catching security issues (exposed secrets, SQL injection patterns, missing auth checks)
- Verifying that error handling is consistent with the rest of the codebase
Key difference from BugBot: Codex has the full repository context, not just the changed files. So it catches things like "this function is now inconsistent with how similar functions work elsewhere in the codebase."
What AI review actually catches
The most valuable catches I've seen:
- Off-by-one errors in pagination — the kind that only show up when you're on the last page
- Missing edge cases in input validation — empty strings, null values, negative numbers
- Race conditions in async code — especially in React useEffect cleanup
- Subtle breaking changes — changing a function signature in a way that's technically backward compatible but breaks callers
What it gets wrong
AI review produces false positives. Common ones:
- Flagging intentional patterns as bugs (not enough context to know it's intentional)
- Suggesting refactors that aren't in scope for the PR
- Missing bugs that require understanding the business logic (the model doesn't know what the code is supposed to do)
Think of it as a fast, cheap first pass. Catches mechanical bugs well. Not a substitute for a human reviewer who understands the product.
Workflow Recipe
Copy-pasteable flows and guided workflow finder
Pick a workflow
Prompt
I need to build [feature]. Here’s the spec: inputs, outputs, edge cases, constraints. Produce a plan before writing any code.
Steps
Spec
Write a developer spec with inputs, outputs, edge cases, and constraints.
Plan
Ask the model to produce a plan — files to create/modify, key decisions. Review before coding.
Code
Implement against the plan. One feature, one PR. Pull the model back if it goes out of scope.
Review
Ask the model to review its own code: "What edge cases might this miss? What would break this?"
Tests
Generate tests from the review’s edge cases: "Write tests for the edge cases you identified."
PR description
Generate the PR description from the spec and the diff. Full context produces clear descriptions.
Tools
Guardrails
- One feature, one PR — keep scope tight
- Review the plan before writing code
- Don’t let the model touch files outside scope
- Ask before refactoring adjacent code
Expected output
Working PR with passing CI, clear description, and tests covering the identified edge cases.
Paste into CLAUDE.md, .cursorrules, or your workflow docs
Failure Gallery
My bruises, your benefit — real failures with exact fixes
Called functions or methods that don't exist in the version you're using.
next/router.prefetch() with invented arguments
Asked the model to add prefetching to a Next.js navigation component. Specified we're on Next.js 14.
What the model produced
sneakyimport { useRouter } from 'next/router';
function NavLink({ href, children }) {
const router = useRouter();
const handleMouseEnter = () => {
router.prefetch(href, { priority: 'high', timeout: 3000 });
};
return (
<a href={href} onMouseEnter={handleMouseEnter}>
{children}
</a>
);
}Prisma cursor pagination with wrong field name
Asked for cursor-based pagination using Prisma. The model had no schema context.
What the model produced
sneakyasync function getNextPage(cursor?: string) {
const items = await prisma.post.findMany({
take: 20,
cursor: cursor ? { id: cursor } : undefined,
skip: cursor ? 1 : 0,
orderBy: { createdAt: 'desc' },
where: { published: true },
});
const nextCursor = items.length === 20
? items[items.length - 1].id
: null;
return { items, nextCursor };
}React Query v5 mutation with v4 API
Asked for a mutation hook using React Query. Didn't specify the version.
What the model produced
trickyimport { useMutation, useQueryClient } from '@tanstack/react-query';
function useUpdateUser() {
const queryClient = useQueryClient();
return useMutation(
(userData) => updateUserApi(userData),
{
onSuccess: () => {
queryClient.invalidateQueries('users');
},
}
);
}Failures are curated from real usage. Susceptibility indicators are derived from model traits in modelSpecs.ts — not hardcoded.