Spec to PR

This is what I run for most non-trivial features. Not the fastest path to working code — the path that produces code I'm comfortable shipping. The extra steps (planning, review, explicit test generation) catch the bugs that are hardest to find in code review.

Not sure this is the right workflow for your task? Use the finder to get a recommendation in a few clicks.

Workflow Recipe

Copy-pasteable flows and guided workflow finder

What are you trying to do?

The full flow

Step 1: Spec. Write a clear description of what you're building. Not a user story — a developer spec. Inputs, outputs, edge cases, constraints. The more specific, the better the plan.

A weak spec: "Add a search feature to the products page."

A useful spec: "Add a search input to the products page that filters the visible product list client-side. The input should debounce at 300ms. Filtering should match against product name and SKU. Empty query shows all products. No server requests — filter the already-loaded list."

The difference isn't length — it's specificity. The second spec tells the model what data is available, what the performance expectation is, and what's out of scope (server requests). That's what prevents a wrong plan.

Step 2: Plan. Ask the model to produce a plan before writing any code:

Given this spec, what files will you create or modify, and what are the key implementation decisions?

Review the plan. This is where you catch misunderstandings before they become code. If the plan mentions files you didn't expect, or skips something you care about, correct it here.

Step 3: Code. Implement against the plan. Keep the scope tight — one feature, one PR. If the model starts touching things outside the scope, pull it back:

Implement step 1 from the plan only. Don't touch anything outside the files you listed.

Step 4: Review. Ask the model to review its own code:

Review this implementation. What edge cases might it miss? What inputs would cause it to fail?

This sounds redundant but it works — the model often catches things in review mode that it missed in generation mode. Different prompt, different output.

Step 5: Tests. Generate tests explicitly, not as an afterthought:

Write tests for the edge cases you identified in the review.

This produces better tests than "add tests to this code" because it's targeting the specific failure modes, not just exercising the happy path.

Step 6: PR description. Ask the model to write the PR description from the spec and the diff. It produces better descriptions than most humans write because it has the full context.

Why the plan step matters

Most common failure mode: skipping the plan and going straight to code. The model produces something that works for the happy path but misses the constraints you had in mind. Then you're correcting code instead of reviewing a plan — much slower.

The plan step also surfaces ambiguities in your spec. If the model asks a clarifying question during planning, that's a question you'd have had to answer anyway — better to answer it before code is written.

One way to make this explicit: before asking for the plan, add "If anything in the spec is unclear or leaves room for multiple interpretations, ask me before planning." Turns it into a two-pass process (clarify, then plan) and consistently produces tighter output.

Model strategy

Different steps benefit from different models. Planning and review are where reasoning models earn their cost. Code generation and test writing? Fast models are often sufficient.

Model Mixer

Compare costs and build optimized model pipelines

Pick a task

Review a medium-sized file and get structured feedback on issues and improvements.

~4k in / ~2k out

How often?

Composer 2.5

$3.45/mo

$0.0057/run

Gemini 3 Flash

$3.90/mo

$0.0065/run

GPT-5.6 Luna

$7.80/mo

$0.013/run

Claude Sonnet 5

$13.80/mo

$0.023/run

GPT-5.6 Terra

$19.50/mo

$0.033/run

Claude Opus 4.8

$34.50/mo

$0.057/run

GPT-5.6 Sol

$39.00/mo

$0.065/run

Claude Fable 5

$69.00/mo

$0.115/run

Composer 2.5 is 95% cheaper than Claude Fable 5 for this task — $3.45 vs $69.00/mo

A fast model catches obvious issues. A stronger model finds subtle logic bugs and gives more actionable feedback — fewer missed problems means less rework later.

Prices from official API docs, verified 2026-07-14. Anthropic · OpenAI · Google · DeepSeek · Cursor

Claude Sonnet 5 is $2/$10 per million input/output tokens through August 31, 2026, then $3/$15.

The full recipe

Workflow Recipe

Copy-pasteable flows and guided workflow finder

Pick a workflow

Prompt

I need to build [feature]. Here’s the spec: inputs, outputs, edge cases, constraints. Produce a plan before writing any code.

Steps

Spec

Write a developer spec with inputs, outputs, edge cases, and constraints.

Plan

Ask the model to produce a plan — files to create/modify, key decisions. Review before coding.

Code

Implement against the plan. One feature, one PR. Pull the model back if it goes out of scope.

Review

Ask the model to review its own code: "What edge cases might this miss? What would break this?"

Tests

Generate tests from the review’s edge cases: "Write tests for the edge cases you identified."

PR description

Generate the PR description from the spec and the diff. Full context produces clear descriptions.

Tools

CursorClaude CodeGitHub Actions

Guardrails

One feature, one PR — keep scope tight
Review the plan before writing code
Don’t let the model touch files outside scope
Ask before refactoring adjacent code

Expected output

Working PR with passing CI, clear description, and tests covering the identified edge cases.

Paste into CLAUDE.md, .cursor/rules/, or your workflow docs