Why does Claude Sonnet 5 count more tokens for the same text?

Sonnet 5 ships a new tokenizer. The identical input string is encoded into roughly 30% more tokens than Sonnet 4.6 produced. That means your per-request input and output token counts rise, your max_tokens budget covers fewer characters of generated text, and your effective per-request credit cost is higher than the raw price-per-token delta suggests. Because of this, the $2/$10 introductory price is best understood as roughly cost-neutral versus Sonnet 4.6's $3/$15 on the same text — not a flat 33% discount.

Do my temperature, top_p and top_k parameters still work on Sonnet 5?

No. Like Opus 4.7 and later, Sonnet 5 rejects non-default temperature, top_p and top_k with HTTP 400 — simply omit those fields. Explicit or manual extended-thinking configuration also returns HTTP 400 because adaptive thinking is on by default; instead control reasoning with the effort levels low / medium / high / xhigh / max (default high). Audit your request builder and strip any hardcoded sampling parameters before you switch the model id.

Migration & Upgrade Guide

Claude Sonnet 5 vs Sonnet 4.6

Q: Is Claude Sonnet 4.6 being retired?

No. Claude Sonnet 4.6 (claude-sonnet-4-6) remains Active. Anthropic lists a tentative retirement no sooner than February 17, 2027, which is subject to change. Sonnet 5 is the recommended new default, but there is no forced-migration deadline — you can keep running 4.6 in production while you validate 5 on your own schedule.

The model-id swap is drop-in — but three breaking changes and a new tokenizer can surprise you. Here is exactly what to change in your API and relay integrations before you flip the switch.

#Claude Sonnet 5 #Claude Sonnet 4.6 #API Migration #Tokenizer

The Verdict: Drop-In Model-ID Swap — But Test First

Claude Sonnet 5 (released 2026-06-30) is the recommended successor to Sonnet 4.6 and approaches Opus 4.8-class quality at a lower price. Swapping the model id is trivial; the surrounding request contract is where you get bitten.

✅ What is drop-in

Same Messages API, same endpoint, same auth. Change the model id from claude-sonnet-4-6 to claude-sonnet-5 (Bedrock: anthropic.claude-sonnet-5) and most requests just work. The context window jumps to 1M tokens by default — no smaller variant to pick — and max output is 128K (up to 300K via the output-300k-2026-03-24 batches beta header).

⚠️ What to test first

Adaptive thinking is now ON by default, so latency and output shape shift; manual extended-thinking config and any non-default temperature/top_p/top_k now return HTTP 400; and a new tokenizer changes your token counts. Route a small percentage of traffic to Sonnet 5, run your evals, then ramp. There is no forced-migration deadline pressuring you.

Spec Diff: Sonnet 4.6 vs Sonnet 5

Side-by-side of the fields that actually affect your integration.

Spec	Claude Sonnet 4.6	Claude Sonnet 5
Model id	claude-sonnet-4-6	claude-sonnet-5 (dateless snapshot)
Context window	Standard Sonnet 4.6 window	1M tokens (default = max)
Max output	Sonnet 4.6 output cap	128K (300K via batches beta header)
Thinking	Extended thinking configured explicitly	Adaptive thinking ON by default; effort low→max (default high)
Sampling params	temperature / top_p / top_k accepted	Non-default values return HTTP 400 — omit them
Price (input / output)	$3 / $15 per MTok	$2 / $10 intro through Aug 31 2026, then $3 / $15
Lifecycle status	Active (tentative retirement no sooner than Feb 17 2027)	New recommended default

Pricing and specs per anthropic.com/news/claude-sonnet-5 and platform.claude.com docs. Knowledge cutoff January 2026. Intro pricing is time-limited, not permanent.

3 Breaking Changes You Must Handle

These are the request-contract differences that can turn a green deploy into a wall of HTTP 400s or unexpected behavior. Fix all three before you swap the model id in production.

1 · Adaptive thinking is ON by default

Sonnet 5 reasons adaptively out of the box, so responses may include a thinking phase you did not opt into on 4.6. This shifts latency, streaming shape and token usage. Control it with effort levels — low, medium, high, xhigh, max (default high) — rather than toggling thinking on and off manually.

2 · Manual extended thinking returns 400

Because thinking is adaptive by default, explicitly configuring manual or extended-thinking blocks the way you might have on earlier models now returns HTTP 400. Remove any explicit thinking configuration from your request builder and rely on the effort parameter instead.

3 · Non-default temperature/top_p/top_k return 400

Exactly like Opus 4.7 and later, Sonnet 5 rejects non-default temperature, top_p and top_k with HTTP 400. Strip these fields from your payloads (or leave them at their defaults by omitting them). Audit SDK wrappers and relay middleware that inject a default temperature automatically.

The New Tokenizer: ~30% More Tokens for the Same Text

This is the least obvious change and the one most likely to blow your budgets and truncation logic.

Sonnet 5 ships a new tokenizer. The identical input string encodes into roughly 30% more tokens than Sonnet 4.6 produced. Nothing about your text changes — the counting does. That reprices every request and reshapes every max_tokens budget, so the $2/$10 introductory price is best understood as roughly cost-neutral versus Sonnet 4.6's $3/$15 on the same text, not a flat 33% discount.

max_tokens covers less text

Because output tokens are denser, a fixed max_tokens value now covers fewer characters of generated text. If you cap max_tokens for structured output, raise the ceiling or your responses may truncate mid-answer where 4.6 completed.

Per-request credit cost rises

Input and output token counts both climb ~30% for the same content, so effective per-request spend is higher than the sticker price delta implies. Re-run your cost model on real traffic rather than assuming the price drop is pure savings.

Re-count, do not extrapolate

Do not reuse Sonnet 4.6 token estimates. Use the count_tokens endpoint against Sonnet 5 to re-measure prompts, context-window headroom and rate-limit budgets before you commit to production limits.

Migration in One Diff

The whole migration is usually: change the model id, and delete the sampling and manual-thinking fields. Here is a minimal before/after.

Before — Sonnet 4.6

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 4096,
  "temperature": 0.7,
  "top_p": 0.9,
  "messages": [...]
}

After — Sonnet 5

{
  "model": "claude-sonnet-5",
  "max_tokens": 4096,
  // omit temperature / top_p / top_k
  // adaptive thinking is on by default
  "messages": [...]
}

Keep the same endpoint, headers and auth token. Delete temperature, top_p and top_k (non-default values 400). Remove any explicit extended-thinking config; use the effort parameter instead. Re-check max_tokens against the new tokenizer so long outputs do not truncate. On a relay, apply the same edits in your middleware so it does not re-inject a default temperature.

Sonnet 4.6 is NOT Retired — There Is No Forced Migration

Claude Sonnet 4.6 (claude-sonnet-4-6) remains Active. Anthropic lists a tentative retirement no sooner than February 17, 2027, and that date is subject to change. Sonnet 5 is the recommended new default, but you are under no deadline: keep 4.6 in production while you validate 5 on your own schedule, then cut over when your evals are green.

Migration FAQ

Should I upgrade from Sonnet 4.6 to Sonnet 5?

For most workloads, yes — Sonnet 5 approaches Opus 4.8-class quality at a lower sticker price and is the recommended successor. But treat it as a code change, not a config flip: the model-id swap is drop-in, yet adaptive thinking is now on by default and manual extended thinking or non-default temperature/top_p/top_k now return HTTP 400. Migrate a small percentage of traffic first, run your evals, then ramp. There is no forced-migration deadline, so you can take your time.

Why does Sonnet 5 count more tokens for the same text?

Sonnet 5 ships a new tokenizer. The identical input string encodes into roughly 30% more tokens than Sonnet 4.6 produced. Your per-request input and output token counts rise, your max_tokens budget covers fewer characters, and your effective per-request credit cost is higher than the raw price-per-token delta suggests. That is why the $2/$10 introductory price is best understood as roughly cost-neutral versus Sonnet 4.6's $3/$15 on the same text — not a flat 33% discount.

Do my temperature, top_p and top_k parameters still work?

No. Like Opus 4.7 and later, Sonnet 5 rejects non-default temperature, top_p and top_k with HTTP 400 — simply omit those fields. Explicit or manual extended-thinking configuration also returns HTTP 400 because adaptive thinking is on by default; control reasoning with the effort levels low / medium / high / xhigh / max (default high). Audit your request builder and relay middleware for hardcoded sampling parameters before you switch the model id.

Is Sonnet 4.6 going away?

No. Claude Sonnet 4.6 remains Active, with a tentative retirement no sooner than February 17, 2027, which is subject to change. Sonnet 5 is the recommended new default, but there is no forced-migration deadline — you can keep running 4.6 in production while you validate 5.

Migrate to Claude Sonnet 5 on QCode

Run Sonnet 5 and Sonnet 4.6 side by side through one API and Claude Code endpoint — swap model ids, compare evals, and ramp when you are ready.

Get an API Key See Sonnet 5 Pricing

Claude Sonnet 5 vs Sonnet 4.6

The Verdict: Drop-In Model-ID Swap — But Test First

Spec Diff: Sonnet 4.6 vs Sonnet 5

3 Breaking Changes You Must Handle

The New Tokenizer: ~30% More Tokens for the Same Text

Migration in One Diff

Sonnet 4.6 is NOT Retired — There Is No Forced Migration

Migration FAQ

Migrate to Claude Sonnet 5 on QCode

Related Guides

Claude Sonnet 5 Overview

Sonnet 5 vs Opus 4.8

Claude Sonnet 5 API

Claude Sonnet 5 Pricing