Reasoning vs Fast Models

The first time I used o3 for a coding task, I was impressed — then I checked the latency and the bill. 40 seconds and 15× the cost of Sonnet, for a task Sonnet would have nailed. That moment is what started me keeping notes on when the expensive reasoning models actually earn their cost — and when they're just burning money.

I'm Filip van Harreveld, a senior creative developer at iO. I share work and notes at filipvanharreveld.com and on LinkedIn.

When reasoning models actually earn it

If the task follows a well-known pattern — "write a React form," "add a REST endpoint" — a fast model will nail it because there's nothing novel to figure out. Reasoning models earn their cost when the task requires weighing tradeoffs or holding multiple constraints in tension. "Design the data model for this feature given these constraints" or "find the bug in this async code that only fails under load" — those need the model to actually think, not just pattern-match.

Architecture decisions with real tradeoffs. I asked Sonnet to choose between three caching strategies for a Next.js app — it picked the most common one. o3 identified that my specific access pattern made the "common" choice wrong and explained why.
Complex refactors with many constraints. "Refactor this without changing the public API, keeping backward compatibility, and not touching the test files" — the more constraints you stack, the more reasoning models pull ahead.
Debugging non-obvious failures. A flaky integration test that only broke in CI. Sonnet suggested the usual suspects — timing, environment variables, missing mocks. Opus traced the actual execution path, found a shared database connection that leaked state between parallel test runs, and explained why it only surfaced in CI where tests run concurrently.

When fast models win

Fast models are often better at constrained tasks because they don't overthink them. I needed to convert 200 webhook payloads from one shape to another — straightforward field mapping, no ambiguity. Haiku did it in 2 seconds flat, correct on the first pass. o3 took 12 seconds per payload and added defensive null checks, fallback defaults, and logging I never asked for. Same output quality for the actual mapping, 6× slower, 10× more expensive.

Reach for GPT-5.6 Luna, Haiku, or Flash when:

Boilerplate generation. Component scaffolding, test stubs, migration files — anything where the structure is known and the model just needs to fill in the blanks.
Structured data extraction. Pulling JSON from unstructured text, classifying support tickets, parsing log lines. Fast models follow extraction schemas reliably without "improving" them.
Pipeline steps where latency compounds. If the model runs in a loop — processing files, transforming records, generating variants — even small per-call overhead adds up fast. A 3-second Haiku call vs. a 30-second o3 call, repeated 50 times, is the difference between 2.5 minutes and 25.

The model mixer approach

I don't pick one model for a task — I use different models for different steps in the same task. Here's a real example: building a new API endpoint with auth, validation, and tests.

Fable 5 for the plan. I described the feature and constraints. Fable mapped out the data model, identified where the new endpoint fit in the existing auth middleware, and flagged a potential race condition with concurrent writes. Expensive, but it caught a design issue I would have hit two hours into implementation.
GPT-5.6 Terra for implementation. With the plan locked, Terra wrote the route handler, validation logic, and database queries. Fast, correct, no scope creep — because the architecture decisions were already made.
Composer 2.5 for the test suite. Gave it the implementation and said "full test coverage, including the race condition Fable flagged." It generated the tests, ran them, caught two failures from its own output, and fixed them autonomously.
GPT-5.6 Sol for review. Asked it to review the final diff for edge cases and security issues, run the relevant checks, and verify the result. It spotted a missing rate limit on the new endpoint and a permissive CORS header I'd copied from another route.

Total cost: a fraction of running Fable end-to-end. Each model played to its strength — deep reasoning where it mattered, balanced execution in the middle, and flagship verification at the boundary.

Model Mixer

Compare costs and build optimized model pipelines

Pick a task

Review a medium-sized file and get structured feedback on issues and improvements.

~4k in / ~2k out

How often?

Composer 2.5

$3.45/mo

$0.0057/run

Gemini 3 Flash

$3.90/mo

$0.0065/run

GPT-5.6 Luna

$7.80/mo

$0.013/run

Claude Sonnet 5

$13.80/mo

$0.023/run

GPT-5.6 Terra

$19.50/mo

$0.033/run

Claude Opus 4.8

$34.50/mo

$0.057/run

GPT-5.6 Sol

$39.00/mo

$0.065/run

Claude Fable 5

$69.00/mo

$0.115/run

Composer 2.5 is 95% cheaper than Claude Fable 5 for this task — $3.45 vs $69.00/mo

A fast model catches obvious issues. A stronger model finds subtle logic bugs and gives more actionable feedback — fewer missed problems means less rework later.

Prices from official API docs, verified 2026-07-14. Anthropic · OpenAI · Google · DeepSeek · Cursor

Claude Sonnet 5 is $2/$10 per million input/output tokens through August 31, 2026, then $3/$15.

See it in practice

Pick a coding scenario below and watch how different models handle it. Pay attention to where reasoning models spend their token budget — and whether that extra thinking actually improves the output.

Scenario Lab

Real tasks, real model outputs — see which model wins and why

Paste a stack trace and get a plain-English explanation with a fix suggestion.

~0.8k in~0.5k outsignal: speed

→

Fast models handle this as well as expensive ones. The answer is either right or obviously wrong — no back-and-forth needed.

🌙

GPT-5.6 LunaBest pickrecommended

Fast, correct, no fluff

What it produced

The error occurs because `user` can be `undefined` when the component first renders. Add a null check: `if (!user) return null;` before accessing `user.name`.

What went right

▸Identified root cause immediately
▸Gave a one-line fix
▸No unnecessary explanation

Cost note

The current GPT-5.6 fast default; negligible at normal volume

token cost

$0.0038/run

💎

Gemini 3 FlashGood fit

Correct but slightly verbose

✨

Claude Sonnet 5Use with care

Correct, but added unrequested refactor suggestions

🧠

Claude Opus 4.8Avoid

Thorough but massively over-engineered for this task

Bottom line

Luna is fast, cheap, and more than capable for error explanation. Paying for Sol or Fable here is waste.

Outputs are curated from real usage. Prices from official API docs, verified 2026-07-14. Anthropic · OpenAI · Google · DeepSeek · Cursor Claude Sonnet 5 is $2/$10 per million input/output tokens through August 31, 2026, then $3/$15.

Pick the right model for your task

Not sure which model fits? Describe what you're building and how complex the task is. The picker weighs tradeoffs you might not think about — latency sensitivity, number of constraints, whether the task needs creativity or compliance.

Model Picker

Top-3 recommendations with score breakdowns

What are you working on?

What does it actually cost?

The price gap between model tiers is real, but it's not as simple as "expensive = better." Run your own numbers below — adjust task count, model choice, and context size to see how costs shift across a realistic workload.

Cost Calculator

Compare token costs across models for common tasks