Part I · Understanding the Models
01
Reasoning vs Fast Models
When to choose reasoning models vs fast models for different coding tasks
Have you ever wondered why the same task costs 15x more on one model than another?
The first time I used o3 for a coding task, I was impressed — then I checked the latency and the bill. 40 seconds and 15× the cost of Sonnet, for a task Sonnet would have nailed. That moment is what started me keeping notes on when the expensive reasoning models actually earn their cost — and when they're just burning money.
I'm Filip van Harreveld, a senior creative developer at iO. I share work and notes at filipvanharreveld.com and on LinkedIn.
When reasoning models actually earn it
If the task follows a well-known pattern — "write a React form," "add a REST endpoint" — a fast model will nail it because there's nothing novel to figure out. Reasoning models earn their cost when the task requires weighing tradeoffs or holding multiple constraints in tension. "Design the data model for this feature given these constraints" or "find the bug in this async code that only fails under load" — those need the model to actually think, not just pattern-match.
- Architecture decisions with real tradeoffs. I asked Sonnet to choose between three caching strategies for a Next.js app — it picked the most common one. o3 identified that my specific access pattern made the "common" choice wrong and explained why.
- Complex refactors with many constraints. "Refactor this without changing the public API, keeping backward compatibility, and not touching the test files" — the more constraints you stack, the more reasoning models pull ahead.
- Debugging non-obvious failures. A flaky integration test that only broke in CI. Sonnet suggested the usual suspects — timing, environment variables, missing mocks. Opus traced the actual execution path, found a shared database connection that leaked state between parallel test runs, and explained why it only surfaced in CI where tests run concurrently.
When fast models win
Fast models are often better at constrained tasks because they don't overthink them. I needed to convert 200 webhook payloads from one shape to another — straightforward field mapping, no ambiguity. Haiku did it in 2 seconds flat, correct on the first pass. o3 took 12 seconds per payload and added defensive null checks, fallback defaults, and logging I never asked for. Same output quality for the actual mapping, 6× slower, 10× more expensive.
Reach for Haiku or Flash when:
- Boilerplate generation. Component scaffolding, test stubs, migration files — anything where the structure is known and the model just needs to fill in the blanks.
- Structured data extraction. Pulling JSON from unstructured text, classifying support tickets, parsing log lines. Fast models follow extraction schemas reliably without "improving" them.
- Pipeline steps where latency compounds. If the model runs in a loop — processing files, transforming records, generating variants — even small per-call overhead adds up fast. A 3-second Haiku call vs. a 30-second o3 call, repeated 50 times, is the difference between 2.5 minutes and 25.
The model mixer approach
I don't pick one model for a task — I use different models for different steps in the same task. Here's a real example: building a new API endpoint with auth, validation, and tests.
- Opus for the plan. I described the feature and constraints. Opus mapped out the data model, identified where the new endpoint fit in the existing auth middleware, and flagged a potential race condition with concurrent writes. 45 seconds, but it caught a design issue I would have hit two hours into implementation.
- Sonnet for implementation. With the plan locked, Sonnet wrote the route handler, validation logic, and database queries. Fast, correct, no scope creep — because the architecture decisions were already made.
- Composer 2 for the test suite. Gave it the implementation and said "full test coverage, including the race condition Opus flagged." It generated the tests, ran them, caught two failures from its own output, and fixed them autonomously.
- GPT-5.4 for review. Asked it to review the final diff for edge cases and security issues. It spotted a missing rate limit on the new endpoint and a permissive CORS header I'd copied from another route.
Total cost: roughly a third of running Opus end-to-end. Each model played to its strength — deep reasoning where it mattered, speed where it didn't.
Model Mixer
Compare costs and build optimized model pipelines
Pick a task
Review a medium-sized file and get structured feedback on issues and improvements.
~4k in / ~2k outHow often?
DeepSeek-V3.2 is 97% cheaper than Claude Opus 4.6 for this task — $1.05 vs $34.50/mo
A fast model catches obvious issues. A stronger model finds subtle logic bugs and gives more actionable feedback — fewer missed problems means less rework later.
See it in practice
Pick a coding scenario below and watch how different models handle it. Pay attention to where reasoning models spend their token budget — and whether that extra thinking actually improves the output.
Scenario Lab
Real tasks, real model outputs — see which model wins and why
Paste a stack trace and get a plain-English explanation with a fix suggestion.
Fast models handle this as well as expensive ones. The answer is either right or obviously wrong — no back-and-forth needed.
Fast, correct, no fluff
What it produced
What went right
- ▸Identified root cause immediately
- ▸Gave a one-line fix
- ▸No unnecessary explanation
Cost note
Negligible at any volume
token cost
$0.0033/run
Correct but slightly verbose
Correct, but added unrequested refactor suggestions
Thorough but massively over-engineered for this task
Bottom line
Haiku is fast, cheap, and more than capable for error explanation. Paying for Sonnet or Opus here is waste.
Pick the right model for your task
Not sure which model fits? Describe what you're building and how complex the task is. The picker weighs tradeoffs you might not think about — latency sensitivity, number of constraints, whether the task needs creativity or compliance.
Model Picker
Top-3 recommendations with score breakdowns
What are you working on?
What does it actually cost?
The price gap between model tiers is real, but it's not as simple as "expensive = better." Run your own numbers below — adjust task count, model choice, and context size to see how costs shift across a realistic workload.
Cost Calculator
Compare token costs across models for common tasks
Pick a task
Review a medium-sized file and get structured feedback on issues and improvements.
~4k in / ~2k outHow often?
DeepSeek-V3.2 is 97% cheaper than Claude Opus 4.6 for this task — $1.05 vs $34.50/mo
A fast model catches obvious issues. A stronger model finds subtle logic bugs and gives more actionable feedback — fewer missed problems means less rework later.