Agent Guardrails

I asked an agent to add one prop to a component. While it was there, it noticed some inconsistent naming in adjacent files. So it fixed those. Then it spotted a few other things worth cleaning up.

The diff came back with 23 modified files.

This isn't the agent being malicious — it's the agent being helpful past the scope you intended. A capable agent that decides to "improve things" while it's in the neighborhood can accumulate a lot of surface area before you realize what happened. And reviewing a 23-file diff for a one-prop change is not a good way to spend your afternoon.

Most of these guardrails I learned the hard way — watching an agent do something I didn't want and working out how to prevent it next time.

The core guardrails

Scope constraints. Most important one: tell the agent what's out of scope, not just what's in scope. "Do not modify files outside src/components." "Do not change the public API of any existing function." "Do not add new dependencies without asking first."

No deletions without confirmation. Agents are good at adding code, bad at knowing when deletion is safe. "Do not delete any files or remove any exports without explicit confirmation" — this one has saved me multiple times.

Stay in the current task. Agents notice adjacent issues and "helpfully" fix them. Usually not helpful — makes diffs larger, harder to review, more likely to introduce unintended changes.

Ask before refactoring. Refactoring is a separate task from implementing a feature. "If you think a refactor would improve the code, describe it and ask before doing it."

The CLAUDE.md approach

Most scalable way to apply guardrails: put them in CLAUDE.md (or .cursorrules, or AGENTS.md). Persistent context that applies to every task, not just the one you're working on.

A good CLAUDE.md includes:

Project conventions (naming, file structure, patterns to follow)
Explicit guardrails (what not to do)
Codebase context (what's important, what's legacy, what's in flux)

SKILL.md — guardrails with domain scope

CLAUDE.md is always-on and project-wide. SKILL.md files are composable and domain-specific — read when working in a specific area, applied automatically.

Natural home for domain-scoped guardrails. Your API layer's SKILL.md says "never return raw database errors to the client." Your design system's says "never use arbitrary values — map everything to a semantic token." These constraints belong close to the domain knowledge they protect, not buried in a project-wide config.

The layering: SKILL.md for domain guardrails, CLAUDE.md for project-wide guardrails, task prompt for the specific request.

See Agents and Skills for how to write and structure skill files.

When guardrails fail

Guardrails reduce failure frequency. They don't eliminate it. What they can't prevent:

Plausible-looking wrong code. Agent stays in scope, follows conventions, produces logically incorrect code.
Subtle behavior changes. Technically within scope, but changes behavior in a non-obvious way.
Context drift. In long sessions, the agent's understanding of the task drifts from your original intent.

So: review every diff carefully, even when the agent followed all the guardrails.

Failure Gallery

My bruises, your benefit — real failures with exact fixes

14 failures catalogued4 critical7 high

Called functions or methods that don't exist in the version you're using.

high

next/router.prefetch() with invented arguments

Asked the model to add prefetching to a Next.js navigation component. Specified we're on Next.js 14.

What the model produced

sneaky

import { useRouter } from 'next/router';

function NavLink({ href, children }) {
  const router = useRouter();

  const handleMouseEnter = () => {
    router.prefetch(href, { priority: 'high', timeout: 3000 });
  };

  return (
    <a href={href} onMouseEnter={handleMouseEnter}>
      {children}
    </a>
  );
}

critical

Prisma cursor pagination with wrong field name

Asked for cursor-based pagination using Prisma. The model had no schema context.

What the model produced

sneaky

async function getNextPage(cursor?: string) {
  const items = await prisma.post.findMany({
    take: 20,
    cursor: cursor ? { id: cursor } : undefined,
    skip: cursor ? 1 : 0,
    orderBy: { createdAt: 'desc' },
    where: { published: true },
  });

  const nextCursor = items.length === 20
    ? items[items.length - 1].id
    : null;

  return { items, nextCursor };
}

high

React Query v5 mutation with v4 API

Asked for a mutation hook using React Query. Didn't specify the version.

What the model produced

tricky

import { useMutation, useQueryClient } from '@tanstack/react-query';

function useUpdateUser() {
  const queryClient = useQueryClient();

  return useMutation(
    (userData) => updateUserApi(userData),
    {
      onSuccess: () => {
        queryClient.invalidateQueries('users');
      },
    }
  );
}

Failures are curated from real usage. Susceptibility indicators are derived from model traits in modelSpecs.ts — not hardcoded.