New - released June 30, 2026

Claude Sonnet 5 API: quick-start in minutes

Everything you need to call claude-sonnet-5 from the Messages API - the model id, working curl and Python examples, the effort parameter, adaptive thinking, and how to re-tune max_tokens for the new tokenizer.

#claude-sonnet-5 #Messages API #effort parameter #adaptive thinking #QCode relay

Quick start

Sonnet 5 is a drop-in on the standard Messages API. If you already call Opus 4.8 or Sonnet 4.6, you are three small steps away.

1

Set the model id

Use claude-sonnet-5 - a pinned, dateless snapshot with no -v1 suffix. No other change is required for a basic call.

2

Drop sampling params

Remove temperature, top_p and top_k. Sonnet 5 rejects them with HTTP 400, exactly like Opus 4.7 and later.

3

Keep thinking adaptive

Adaptive thinking is on by default. Do not send manual extended-thinking config - explicit thinking returns 400. Tune depth with effort instead.

Claude API model id
claude-sonnet-5
Amazon Bedrock id
anthropic.claude-sonnet-5
OpenRouter slug
anthropic/claude-sonnet-5-20260630
curl - basic Messages request
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-5",
    "max_tokens": 4096,
    "thinking": { "type": "adaptive" },
    "effort": "high",
    "messages": [
      { "role": "user", "content": "Refactor this module and add tests." }
    ]
  }'
# NOTE: do NOT send temperature / top_p / top_k -> HTTP 400
Python - anthropic SDK
from anthropic import Anthropic

client = Anthropic()  # reads ANTHROPIC_API_KEY

resp = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=4096,
    thinking={"type": "adaptive"},  # on by default
    effort="high",                  # low | medium | high | xhigh | max
    messages=[
        {"role": "user", "content": "Explain this stack trace and propose a fix."}
    ],
    # temperature / top_p / top_k omitted -> sending them returns HTTP 400
)
print(resp.content[0].text)

Both examples omit temperature, top_p and top_k on purpose: sending any of them to claude-sonnet-5 returns HTTP 400. Adaptive thinking (thinking={"type":"adaptive"}) is the default, and the effort field controls how much reasoning the model spends per request.

Re-tune your max_tokens

Sonnet 5 ships a new tokenizer. The same text consumes about 30% more tokens than Sonnet 4.6, so limits and budgets copied from older code will bite.

Raise your output cap

A max_tokens value that comfortably fit a Sonnet 4.6 response may now truncate. Give roughly 30% more headroom, up to the 128K output ceiling (300K via the output-300k-2026-03-24 batches beta header).

Re-check context budgets

The context window is 1M tokens - both the default and the max, with no smaller variant - but prompts that were sized in tokens will pack more densely, so re-measure with token counting rather than assuming.

Re-estimate cost per request

Because the same text is ~30% more tokens, the intro price of $2/MTok in + $10/MTok out is best read as roughly cost-neutral versus Sonnet 4.6's $3/$15 on identical text - not a flat 33% discount.

Introductory pricing ($2/MTok input + $10/MTok output) runs through Aug 31, 2026; standard pricing is $3/MTok input + $15/MTok output from Sep 1, 2026. Cache reads are $0.20 intro / $0.30 standard, a 5-minute cache write is 1.25x base input and a 1-hour cache write is 2x base input. Always size max_tokens against the new tokenizer, not your old counts.

Effort levels

The effort parameter tunes how much reasoning Sonnet 5 spends per request. It accepts five levels; the default is high.

effort Best for Latency & output tokens
low Classification, extraction, short chat and other latency-sensitive, well-scoped calls. Lowest latency, fewest reasoning tokens.
medium Everyday assistance and lighter coding where you want a little more deliberation without full depth. Moderate latency and token spend.
high default Most agentic and coding work - the balanced default that pairs speed with strong reasoning. Balanced; the recommended starting point.
xhigh Hard multi-step debugging, architecture and long-horizon agent runs that reward deeper thinking. Higher latency, more reasoning tokens.
max The most demanding reasoning and judgment tasks where you want maximum depth regardless of cost. Highest latency and token spend.

Start at high and only move down for cheap, latency-sensitive traffic or up for the hardest problems. Sonnet 5 approaches Opus 4.8 quality at a lower price, but Opus 4.8 still leads the hardest coding, judgment and cyber tasks - reach for it when max effort on Sonnet 5 is not enough.

Using Sonnet 5 through QCode

QCode relays the same Messages API, so your Sonnet 5 code is unchanged - only the base URL and key differ.

Identical API surface

Same claude-sonnet-5 model id, same effort parameter, same adaptive thinking. Point the SDK at the relay base URL and your existing code just works.

One key, many models

A single QCode key reaches Claude, Codex and Gemini models - no juggling separate provider accounts or billing.

Stable China access

The relay provides low-latency, reliable access from mainland China, so Sonnet 5 works without wrestling with cross-border connectivity.

Drop-in for tools

Works with the anthropic SDK, Claude Code and any client that speaks the Messages API - set the base URL and go.

Python - Sonnet 5 via the QCode relay
from anthropic import Anthropic

client = Anthropic(
    base_url="https://relay.qcode.cc",  # QCode relay
    api_key="qk-..."                     # one key for Claude / Codex / Gemini
)

resp = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    effort="high",
    messages=[{"role": "user", "content": "Ship it."}],
)

Frequently asked questions

What is the Claude Sonnet 5 model id?

The model id is claude-sonnet-5 - a pinned, dateless snapshot with no -v1 suffix. On Amazon Bedrock it is anthropic.claude-sonnet-5, and on OpenRouter the slug is anthropic/claude-sonnet-5-20260630. Pass claude-sonnet-5 in the model field of any Messages API request.

Do I need to change my code to call Sonnet 5?

Mostly no - swap the model id to claude-sonnet-5 and your existing Messages API code keeps working. Two things to check: remove any temperature, top_p or top_k fields (they return HTTP 400 on Sonnet 5, same as Opus 4.7+), and do not send manual extended-thinking config - adaptive thinking is on by default and explicit thinking config returns 400. Also re-tune max_tokens, because the new tokenizer emits about 30% more tokens for the same text.

Which effort level should I use with Sonnet 5?

The default is high, which suits most agentic and coding work. Use low or medium for cheap, latency-sensitive calls like classification, extraction and short chat, and xhigh or max for the hardest multi-step reasoning where you want maximum depth. Effort trades latency and output tokens for reasoning depth, so start at high and adjust per workload.

Can I use Claude Sonnet 5 through QCode?

Yes. QCode exposes the same Messages API surface via its relay, so you keep the claude-sonnet-5 model id, the effort parameter and adaptive thinking unchanged - only the base URL and key differ. One QCode key works across Claude, Codex and Gemini models, and the relay provides stable low-latency access from mainland China.

Start building with Claude Sonnet 5

Get one key for Claude, Codex and Gemini - with stable access and the same Messages API you already use.