Three-vendor comparison · maximally hedged

Claude Sonnet 5 vs GPT-5.6 vs Gemini 3.1 Pro

Anthropic, OpenAI and Google never ran a shared benchmark — so a single 'winner' number does not exist. Here is what is genuinely comparable (price, context, availability) and what is not (leaderboard scores), with the benchmark cells left honestly blank.

#Claude Sonnet 5 #GPT-5.6 #Gemini 3.1 Pro #Honest comparison

The one-line verdict

There is no shared benchmark between these vendors — compare on price, value and availability, not a single score.

Anthropic published no exact benchmark numbers for Claude Sonnet 5; it only says performance is 'close to Opus 4.8' qualitatively. OpenAI and Google run their own harnesses on their own dates. So the useful decision axes are cost, context window, output limits and where each model is actually generally available. Claude Sonnet 5 is a mid-tier, agentic-coding model that approaches Opus 4.8 at a lower price — Opus 4.8 still leads the hardest coding, judgment and cyber tasks. Trial all three on your own workload before you decide.

What IS and IS NOT comparable

✅ What you CAN compare

Hard, published facts line up across vendors: list price per million tokens, context-window size, maximum output length, knowledge cutoff, and — crucially — real availability (which model is GA on which platform, and which tiers are still preview). These are the numbers that actually change your bill and your integration plan, and they are the honest basis for a decision.

🚫 What you CANNOT compare

A single 'best model' score does not exist here. Anthropic released no SWE-bench, Terminal-Bench or OSWorld figure for Sonnet 5, and no cross-vendor evaluation was run on identical harnesses, prompts and snapshots. Any table that shows Sonnet 5, GPT-5.6 and Gemini side by side with exact percentages is stitching together third-party estimates from different dates — treat those numbers as directional, never authoritative.

Specs & availability at a glance

Verified specs where vendors published them; benchmark row left blank on purpose.

Attribute Claude Sonnet 5 GPT-5.6 Gemini 3.1 Pro
Vendor & positioningAnthropic · mid-tier, best speed/intelligence blendOpenAI · flagship lineGoogle · flagship line
Context window1M tokens (default = max)Tier-dependent — see OpenAI docsVendor-listed — see Google Vertex docs
Max output128K (up to 300K via Batches beta)Vendor-listedVendor-listed
Price (input / output per MTok)$2 / $10 intro (→ $3 / $15 from Sep 1 2026)Tier-dependent; top tiers not GA-wideSee Google pricing (region/tier)
Knowledge cutoffJanuary 2026Vendor-listedVendor-listed
Availability / GA statusGA — default on Free & Pro; API, Bedrock, Vertex AI, Foundry, Copilot, OpenRouterRolling out; top tiers (e.g. Sol Ultra) limited-previewVaries by region & platform
Public benchmark (SWE-bench / Terminal-Bench)— not vendor-published— no shared harness— no shared harness
Best-suited forAgentic coding at a mid-tier price; Opus-adjacent qualityTrial on your own task mixTrial on your own task mix

Claude Sonnet 5 specs and pricing per anthropic.com/news/claude-sonnet-5 and platform.claude.com docs (introductory $2/$10 input/output through Aug 31 2026, then $3/$15; new tokenizer uses ~30% more tokens on the same text, so the intro price is roughly cost-neutral vs Sonnet 4.6's $3/$15). GPT-5.6 and Gemini cells are directional pointers to each vendor's own docs, not QCode-verified figures. No benchmark number here is Anthropic-published.

Two caveats that break most comparisons online

🔮 GPT-5.6 top tiers are limited-preview, not GA

The most-hyped GPT-5.6 tiers — marketed under names such as Sol Ultra — are reported to be limited-preview rather than generally available, and access varies by plan and region. Comparing a preview-only tier against a GA model is apples-to-oranges: you may not even be able to call the tier a benchmark used. Always confirm your actual GPT-5.6 tier access with the vendor before you plan around it.

📅 Most 'Sonnet vs Gemini' tables are Sonnet 4.6-era

Claude Sonnet 5 launched on 2026-06-30, so the vast majority of 'Sonnet vs Gemini' comparison tables circulating online were built against Claude Sonnet 4.6 and were never updated. They carry old prices, old context limits and old snapshots. Note that Sonnet 4.6 (claude-sonnet-4-6) is still Active — not retired, tentative retirement no sooner than Feb 17 2027 — with Sonnet 5 as the recommended successor and new default. If a table does not say 'Sonnet 5', assume it is stale.

Use all three from one QCode key

Instead of picking a winner from numbers that do not exist, run your own bake-off. A single QCode API key lets you route to Claude Sonnet 5 for agentic coding and to the OpenAI-compatible line for GPT-5.x, so you can A/B the same task and judge on real output, latency and cost.

Claude Code (Claude Sonnet 5)
export ANTHROPIC_BASE_URL="https://api.qcode.cc"
export ANTHROPIC_AUTH_TOKEN="$QCODE_KEY"
export ANTHROPIC_MODEL="claude-sonnet-5"
claude
OpenAI Codex CLI (GPT-5.x)
npm install -g @openai/codex
# add QCode profile in ~/.codex/config.toml
codex --profile qcode

Point Claude Code at api.qcode.cc with model claude-sonnet-5, and add a QCode profile to the Codex CLI for GPT-5.x. Then run the identical task through each and compare on your own repo — the only benchmark that actually matters for your decision. Gemini access depends on your plan and region; check the vendor for current availability.

Frequently asked questions

Which of these three is best for coding?

There is no shared, vendor-published benchmark across Anthropic, OpenAI and Google, so no single model can be crowned 'best for coding' from official numbers. Anthropic positions Claude Sonnet 5 as a mid-tier agentic-coding model whose quality approaches Opus 4.8 at a lower price; Opus 4.8 still leads the hardest coding and judgment tasks. The honest approach is to trial all three on your own repository and evaluate on your task mix, latency and cost rather than a leaderboard.

Is Claude Sonnet 5 better than GPT-5.6?

No one can say from published data. Anthropic did not release exact benchmark numbers for Sonnet 5 and there is no apples-to-apples comparison the two vendors both ran. Sonnet 5 is a mid-tier model priced at $2/MTok input and $10/MTok output (introductory, through Aug 31 2026; then $3/$15). Any 'Sonnet 5 beats GPT-5.6' claim you see online is a third-party estimate, not a vendor result.

Can I compare SWE-bench scores between them?

Not reliably. Anthropic published no exact SWE-bench, Terminal-Bench or OSWorld number for Claude Sonnet 5 — only the qualitative claim that its performance is 'close to Opus 4.8'. Cross-vendor SWE-bench tables usually mix different harnesses, dates and model snapshots, so they are not directly comparable. Treat any number you see as a third-party estimate, not an Anthropic-published figure.

Is GPT-5.6 generally available?

Availability is uneven. GPT-5.6's top tiers (marketed names such as Sol Ultra) are reported to be limited-preview rather than general availability, and access varies by plan and region. By contrast Claude Sonnet 5 is GA: it is the default model on Free and Pro plans and is available via the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, GitHub Copilot and OpenRouter. Always confirm GPT-5.6 tier access with the vendor before committing.

Run your own bake-off on QCode

One key, real tasks, honest results — decide with your own numbers, not someone else's leaderboard.