Claude Sonnet 5 vs Sonnet 4.6
The model-id swap is drop-in — but three breaking changes and a new tokenizer can surprise you. Here is exactly what to change in your API and relay integrations before you flip the switch.
The Verdict: Drop-In Model-ID Swap — But Test First
Claude Sonnet 5 (released 2026-06-30) is the recommended successor to Sonnet 4.6 and approaches Opus 4.8-class quality at a lower price. Swapping the model id is trivial; the surrounding request contract is where you get bitten.
Same Messages API, same endpoint, same auth. Change the model id from claude-sonnet-4-6 to claude-sonnet-5 (Bedrock: anthropic.claude-sonnet-5) and most requests just work. The context window jumps to 1M tokens by default — no smaller variant to pick — and max output is 128K (up to 300K via the output-300k-2026-03-24 batches beta header).
Adaptive thinking is now ON by default, so latency and output shape shift; manual extended-thinking config and any non-default temperature/top_p/top_k now return HTTP 400; and a new tokenizer changes your token counts. Route a small percentage of traffic to Sonnet 5, run your evals, then ramp. There is no forced-migration deadline pressuring you.
Spec Diff: Sonnet 4.6 vs Sonnet 5
Side-by-side of the fields that actually affect your integration.
| Spec | Claude Sonnet 4.6 | Claude Sonnet 5 |
|---|---|---|
| Model id | claude-sonnet-4-6 | claude-sonnet-5 (dateless snapshot) |
| Context window | Standard Sonnet 4.6 window | 1M tokens (default = max) |
| Max output | Sonnet 4.6 output cap | 128K (300K via batches beta header) |
| Thinking | Extended thinking configured explicitly | Adaptive thinking ON by default; effort low→max (default high) |
| Sampling params | temperature / top_p / top_k accepted | Non-default values return HTTP 400 — omit them |
| Price (input / output) | $3 / $15 per MTok | $2 / $10 intro through Aug 31 2026, then $3 / $15 |
| Lifecycle status | Active (tentative retirement no sooner than Feb 17 2027) | New recommended default |
Pricing and specs per anthropic.com/news/claude-sonnet-5 and platform.claude.com docs. Knowledge cutoff January 2026. Intro pricing is time-limited, not permanent.
3 Breaking Changes You Must Handle
These are the request-contract differences that can turn a green deploy into a wall of HTTP 400s or unexpected behavior. Fix all three before you swap the model id in production.
Sonnet 5 reasons adaptively out of the box, so responses may include a thinking phase you did not opt into on 4.6. This shifts latency, streaming shape and token usage. Control it with effort levels — low, medium, high, xhigh, max (default high) — rather than toggling thinking on and off manually.
Because thinking is adaptive by default, explicitly configuring manual or extended-thinking blocks the way you might have on earlier models now returns HTTP 400. Remove any explicit thinking configuration from your request builder and rely on the effort parameter instead.
Exactly like Opus 4.7 and later, Sonnet 5 rejects non-default temperature, top_p and top_k with HTTP 400. Strip these fields from your payloads (or leave them at their defaults by omitting them). Audit SDK wrappers and relay middleware that inject a default temperature automatically.
The New Tokenizer: ~30% More Tokens for the Same Text
This is the least obvious change and the one most likely to blow your budgets and truncation logic.
Sonnet 5 ships a new tokenizer. The identical input string encodes into roughly 30% more tokens than Sonnet 4.6 produced. Nothing about your text changes — the counting does. That reprices every request and reshapes every max_tokens budget, so the $2/$10 introductory price is best understood as roughly cost-neutral versus Sonnet 4.6's $3/$15 on the same text, not a flat 33% discount.
Because output tokens are denser, a fixed max_tokens value now covers fewer characters of generated text. If you cap max_tokens for structured output, raise the ceiling or your responses may truncate mid-answer where 4.6 completed.
Input and output token counts both climb ~30% for the same content, so effective per-request spend is higher than the sticker price delta implies. Re-run your cost model on real traffic rather than assuming the price drop is pure savings.
Do not reuse Sonnet 4.6 token estimates. Use the count_tokens endpoint against Sonnet 5 to re-measure prompts, context-window headroom and rate-limit budgets before you commit to production limits.
Migration in One Diff
The whole migration is usually: change the model id, and delete the sampling and manual-thinking fields. Here is a minimal before/after.
{
"model": "claude-sonnet-4-6",
"max_tokens": 4096,
"temperature": 0.7,
"top_p": 0.9,
"messages": [...]
}
{
"model": "claude-sonnet-5",
"max_tokens": 4096,
// omit temperature / top_p / top_k
// adaptive thinking is on by default
"messages": [...]
}
Keep the same endpoint, headers and auth token. Delete temperature, top_p and top_k (non-default values 400). Remove any explicit extended-thinking config; use the effort parameter instead. Re-check max_tokens against the new tokenizer so long outputs do not truncate. On a relay, apply the same edits in your middleware so it does not re-inject a default temperature.
Sonnet 4.6 is NOT Retired — There Is No Forced Migration
Claude Sonnet 4.6 (claude-sonnet-4-6) remains Active. Anthropic lists a tentative retirement no sooner than February 17, 2027, and that date is subject to change. Sonnet 5 is the recommended new default, but you are under no deadline: keep 4.6 in production while you validate 5 on your own schedule, then cut over when your evals are green.
Migration FAQ
Should I upgrade from Sonnet 4.6 to Sonnet 5?
For most workloads, yes — Sonnet 5 approaches Opus 4.8-class quality at a lower sticker price and is the recommended successor. But treat it as a code change, not a config flip: the model-id swap is drop-in, yet adaptive thinking is now on by default and manual extended thinking or non-default temperature/top_p/top_k now return HTTP 400. Migrate a small percentage of traffic first, run your evals, then ramp. There is no forced-migration deadline, so you can take your time.
Why does Sonnet 5 count more tokens for the same text?
Sonnet 5 ships a new tokenizer. The identical input string encodes into roughly 30% more tokens than Sonnet 4.6 produced. Your per-request input and output token counts rise, your max_tokens budget covers fewer characters, and your effective per-request credit cost is higher than the raw price-per-token delta suggests. That is why the $2/$10 introductory price is best understood as roughly cost-neutral versus Sonnet 4.6's $3/$15 on the same text — not a flat 33% discount.
Do my temperature, top_p and top_k parameters still work?
No. Like Opus 4.7 and later, Sonnet 5 rejects non-default temperature, top_p and top_k with HTTP 400 — simply omit those fields. Explicit or manual extended-thinking configuration also returns HTTP 400 because adaptive thinking is on by default; control reasoning with the effort levels low / medium / high / xhigh / max (default high). Audit your request builder and relay middleware for hardcoded sampling parameters before you switch the model id.
Is Sonnet 4.6 going away?
No. Claude Sonnet 4.6 remains Active, with a tentative retirement no sooner than February 17, 2027, which is subject to change. Sonnet 5 is the recommended new default, but there is no forced-migration deadline — you can keep running 4.6 in production while you validate 5.
Migrate to Claude Sonnet 5 on QCode
Run Sonnet 5 and Sonnet 4.6 side by side through one API and Claude Code endpoint — swap model ids, compare evals, and ramp when you are ready.
Related Guides
Claude Sonnet 5 Overview
Specs, positioning and availability for Anthropic's new mid-tier default model.
Sonnet 5 vs Opus 4.8
When the mid-tier is enough and when the hardest tasks still need Opus 4.8.
Claude Sonnet 5 API
Endpoints, effort levels, streaming and the request-contract changes in detail.
Claude Sonnet 5 Pricing
Intro vs standard rates, cache pricing, and the tokenizer's real cost impact.