Claude Sonnet 5 vs GPT-5.6 vs Gemini 3.1 Pro
Anthropic, OpenAI and Google never ran a shared benchmark — so a single 'winner' number does not exist. Here is what is genuinely comparable (price, context, availability) and what is not (leaderboard scores), with the benchmark cells left honestly blank.
The one-line verdict
There is no shared benchmark between these vendors — compare on price, value and availability, not a single score.
Anthropic published no exact benchmark numbers for Claude Sonnet 5; it only says performance is 'close to Opus 4.8' qualitatively. OpenAI and Google run their own harnesses on their own dates. So the useful decision axes are cost, context window, output limits and where each model is actually generally available. Claude Sonnet 5 is a mid-tier, agentic-coding model that approaches Opus 4.8 at a lower price — Opus 4.8 still leads the hardest coding, judgment and cyber tasks. Trial all three on your own workload before you decide.
What IS and IS NOT comparable
✅ What you CAN compare
Hard, published facts line up across vendors: list price per million tokens, context-window size, maximum output length, knowledge cutoff, and — crucially — real availability (which model is GA on which platform, and which tiers are still preview). These are the numbers that actually change your bill and your integration plan, and they are the honest basis for a decision.
🚫 What you CANNOT compare
A single 'best model' score does not exist here. Anthropic released no SWE-bench, Terminal-Bench or OSWorld figure for Sonnet 5, and no cross-vendor evaluation was run on identical harnesses, prompts and snapshots. Any table that shows Sonnet 5, GPT-5.6 and Gemini side by side with exact percentages is stitching together third-party estimates from different dates — treat those numbers as directional, never authoritative.
Specs & availability at a glance
Verified specs where vendors published them; benchmark row left blank on purpose.
| Attribute | Claude Sonnet 5 | GPT-5.6 | Gemini 3.1 Pro |
|---|---|---|---|
| Vendor & positioning | Anthropic · mid-tier, best speed/intelligence blend | OpenAI · flagship line | Google · flagship line |
| Context window | 1M tokens (default = max) | Tier-dependent — see OpenAI docs | Vendor-listed — see Google Vertex docs |
| Max output | 128K (up to 300K via Batches beta) | Vendor-listed | Vendor-listed |
| Price (input / output per MTok) | $2 / $10 intro (→ $3 / $15 from Sep 1 2026) | Tier-dependent; top tiers not GA-wide | See Google pricing (region/tier) |
| Knowledge cutoff | January 2026 | Vendor-listed | Vendor-listed |
| Availability / GA status | GA — default on Free & Pro; API, Bedrock, Vertex AI, Foundry, Copilot, OpenRouter | Rolling out; top tiers (e.g. Sol Ultra) limited-preview | Varies by region & platform |
| Public benchmark (SWE-bench / Terminal-Bench) | — not vendor-published | — no shared harness | — no shared harness |
| Best-suited for | Agentic coding at a mid-tier price; Opus-adjacent quality | Trial on your own task mix | Trial on your own task mix |
Claude Sonnet 5 specs and pricing per anthropic.com/news/claude-sonnet-5 and platform.claude.com docs (introductory $2/$10 input/output through Aug 31 2026, then $3/$15; new tokenizer uses ~30% more tokens on the same text, so the intro price is roughly cost-neutral vs Sonnet 4.6's $3/$15). GPT-5.6 and Gemini cells are directional pointers to each vendor's own docs, not QCode-verified figures. No benchmark number here is Anthropic-published.
Two caveats that break most comparisons online
🔮 GPT-5.6 top tiers are limited-preview, not GA
The most-hyped GPT-5.6 tiers — marketed under names such as Sol Ultra — are reported to be limited-preview rather than generally available, and access varies by plan and region. Comparing a preview-only tier against a GA model is apples-to-oranges: you may not even be able to call the tier a benchmark used. Always confirm your actual GPT-5.6 tier access with the vendor before you plan around it.
📅 Most 'Sonnet vs Gemini' tables are Sonnet 4.6-era
Claude Sonnet 5 launched on 2026-06-30, so the vast majority of 'Sonnet vs Gemini' comparison tables circulating online were built against Claude Sonnet 4.6 and were never updated. They carry old prices, old context limits and old snapshots. Note that Sonnet 4.6 (claude-sonnet-4-6) is still Active — not retired, tentative retirement no sooner than Feb 17 2027 — with Sonnet 5 as the recommended successor and new default. If a table does not say 'Sonnet 5', assume it is stale.
Use all three from one QCode key
Instead of picking a winner from numbers that do not exist, run your own bake-off. A single QCode API key lets you route to Claude Sonnet 5 for agentic coding and to the OpenAI-compatible line for GPT-5.x, so you can A/B the same task and judge on real output, latency and cost.
export ANTHROPIC_BASE_URL="https://api.qcode.cc"
export ANTHROPIC_AUTH_TOKEN="$QCODE_KEY"
export ANTHROPIC_MODEL="claude-sonnet-5"
claude
npm install -g @openai/codex
# add QCode profile in ~/.codex/config.toml
codex --profile qcode
Point Claude Code at api.qcode.cc with model claude-sonnet-5, and add a QCode profile to the Codex CLI for GPT-5.x. Then run the identical task through each and compare on your own repo — the only benchmark that actually matters for your decision. Gemini access depends on your plan and region; check the vendor for current availability.
Frequently asked questions
Which of these three is best for coding?
There is no shared, vendor-published benchmark across Anthropic, OpenAI and Google, so no single model can be crowned 'best for coding' from official numbers. Anthropic positions Claude Sonnet 5 as a mid-tier agentic-coding model whose quality approaches Opus 4.8 at a lower price; Opus 4.8 still leads the hardest coding and judgment tasks. The honest approach is to trial all three on your own repository and evaluate on your task mix, latency and cost rather than a leaderboard.
Is Claude Sonnet 5 better than GPT-5.6?
No one can say from published data. Anthropic did not release exact benchmark numbers for Sonnet 5 and there is no apples-to-apples comparison the two vendors both ran. Sonnet 5 is a mid-tier model priced at $2/MTok input and $10/MTok output (introductory, through Aug 31 2026; then $3/$15). Any 'Sonnet 5 beats GPT-5.6' claim you see online is a third-party estimate, not a vendor result.
Can I compare SWE-bench scores between them?
Not reliably. Anthropic published no exact SWE-bench, Terminal-Bench or OSWorld number for Claude Sonnet 5 — only the qualitative claim that its performance is 'close to Opus 4.8'. Cross-vendor SWE-bench tables usually mix different harnesses, dates and model snapshots, so they are not directly comparable. Treat any number you see as a third-party estimate, not an Anthropic-published figure.
Is GPT-5.6 generally available?
Availability is uneven. GPT-5.6's top tiers (marketed names such as Sol Ultra) are reported to be limited-preview rather than general availability, and access varies by plan and region. By contrast Claude Sonnet 5 is GA: it is the default model on Free and Pro plans and is available via the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, GitHub Copilot and OpenRouter. Always confirm GPT-5.6 tier access with the vendor before committing.
Run your own bake-off on QCode
One key, real tasks, honest results — decide with your own numbers, not someone else's leaderboard.
Related reading
Claude Sonnet 5: full overview
Specs, pricing, context window and availability for Anthropic's new mid-tier default.
AI Model Radar 2026
Where every current flagship stands on availability and honest status badges.
GPT-5.6 guide
What is confirmed, what is preview-only, and how to access the OpenAI line.
Claude Sonnet 5 vs Opus 4.8
When mid-tier Sonnet 5 is enough and when the hardest tasks still need Opus 4.8.