LLM Workload Pricing
(Pricing data as of December 2025)
The actual cost of inference depends not just on token prices, but on how input and output tokens are combined.
This page compares the cost of workloads across fixed input-to-output ratios from retrieval-heavy mixes (8:1) to reasoning-heavy mixes (1:8). For each model, costs are normalized to 1 million output tokens, making it possible to observe how pricing behavior changes across workload regimes and model tiers under consistent assumptions.
Frontier Models (Frontier High)
Derived cost for 1M output tokens at different input:output mixesExample: 8:1 = 8M input + 1M output1:8 = 125K input + 1M output | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | Provider | Input ($/1M) | Output ($/1M) | Retrieval (8:1) | Context (4:1) | Balanced (1:1) | Generative (1:4) | Reasoning (1:8) |
| GPT-5.2 | OpenAI | $1.75 | $14.00 | $28.00 | $21.00 | $15.75 | $14.44 | $14.22 |
| Gemini 3 Pro | $2.00 | $12.00 | $28.00 | $20.00 | $14.00 | $12.50 | $12.25 | |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | $39.00 | $27.00 | $18.00 | $15.75 | $15.38 |
| DeepSeek-R1 | Together.AI | $3.00 | $7.00 | $31.00 | $19.00 | $10.00 | $7.75 | $7.38 |
| Llama 3.1 405B Instruct Turbo | Together.AI | $3.50 | $3.50 | $31.50 | $17.50 | $7.00 | $4.38 | $3.94 |
Optimized Models (Frontier Value)
Derived cost for 1M output tokens at different input:output mixesExample: 8:1 = 8M input + 1M output1:8 = 125K input + 1M output | ||||||||
|---|---|---|---|---|---|---|---|---|
| Model | Provider | Input ($/1M) | Output ($/1M) | Retrieval (8:1) | Context (4:1) | Balanced (1:1) | Generative (1:4) | Reasoning (1:8) |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | $4.00 | $3.00 | $2.25 | $2.06 | $2.03 |
| Gemini 2.5 Flash | $0.30 | $2.50 | $4.90 | $3.70 | $2.80 | $2.58 | $2.54 | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | $13.00 | $9.00 | $6.00 | $5.25 | $5.13 |
| DeepSeek-V3 | Together.AI | $1.25 | $1.25 | $11.25 | $6.25 | $2.50 | $1.56 | $1.41 |
| Llama 3.3 70B Instruct-Turbo | Together.AI | $0.88 | $0.88 | $7.92 | $4.40 | $1.76 | $1.10 | $0.99 |
Takeaway 1: High Input Workloads
Within Frontier models, high-input workloads (8:1 and 4:1) produce tightly clustered costs.
What To Notice:
- Retrieval-heavy (8:1) and context-heavy (4:1) scenarios fall within a narrow cost range across Frontier models.
- In high-input workloads, differences in output token pricing contribute less to total cost.
Frontier Models, Retrieval (8:1) and Context (4:1) derived costs.
| Model | Provider | Retrieval (8:1) | Context (4:1) |
|---|---|---|---|
| GPT-5.2 | OpenAI | $28.00 | $21.00 |
| Gemini 3 Pro | $28.00 | $20.00 | |
| Claude Sonnet 4.5 | Anthropic | $39.00 | $27.00 |
| DeepSeek-R1 | Together.AI | $31.00 | $19.00 |
| Llama 3.1 405B Instruct Turbo | Together.AI | $31.50 | $17.50 |
Takeaway 2: Output-Heavy Workloads
Within Frontier models, high-output workloads (1:4 and 1:8) produce wide cost dispersion.
What To Notice:
- At generative (1:4) and reasoning (1:8) ratios, Frontier models separate sharply by output token pricing.
- Relative cost ordering remains largely consistent between 1:4 and 1:8 scenarios.
- In output-heavy workloads, derived costs track closely with output token pricing.
Frontier Models, Generative (1:4) and Reasoning (1:8) derived costs.
| Model | Provider | Generative (1:4) | Reasoning (1:8) |
|---|---|---|---|
| GPT-5.2 | OpenAI | $14.44 | $14.22 |
| Gemini 3 Pro | $12.50 | $12.25 | |
| Claude Sonnet 4.5 | Anthropic | $15.75 | $15.38 |
| DeepSeek-R1 | Together.AI | $7.75 | $7.38 |
| Llama 3.1 405B Instruct Turbo | Together.AI | $4.38 | $3.94 |
Takeaway 3: Cost Variability in Optimized Models
Within Optimized models, workload-driven cost patterns persist, but costs are more dispersed across providers.
What To Notice:
- Retrieval-heavy (8:1) and output-heavy (1:8) workloads follow the same input/output cost dynamics observed in Frontier models.
- Compared to Frontier models, Optimized models exhibit less cost clustering and greater dispersion across providers.
Optimized Models, Retrieval (8:1) and Reasoning (1:8) derived costs.
| Model | Provider | Retrieval (8:1) | Reasoning (1:8) |
|---|---|---|---|
| GPT-5 mini | OpenAI | $4.00 | $2.03 |
| Gemini 2.5 Flash | $4.90 | $2.54 | |
| Claude Haiku 4.5 | Anthropic | $13.00 | $5.13 |
| DeepSeek-V3 | Together.AI | $11.25 | $1.41 |
| Llama 3.3 70B Instruct-Turbo | Together.AI | $7.92 | $0.99 |
Takeaway 4: Optimized Pricing Is Less Uniform
The tight cost clustering observed in Frontier models breaks down in the Optimized tier due to greater variation in token pricing.
What To Notice:
- For retrieval (8:1) and context (4:1) workloads, Frontier models cluster tightly, while Optimized models exhibit wider spreads across the same mixes.
- That breakdown is driven by wider variation in input and output token pricing across Optimized models.
Optimized Models, Input and Output pricing.
| Model | Provider | Input ($/1M) | Output ($/1M) |
|---|---|---|---|
| GPT-5 mini | OpenAI | $0.25 | $2.00 |
| Gemini 2.5 Flash | $0.30 | $2.50 | |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 |
| DeepSeek-V3 | Together.AI | $1.25 | $1.25 |
| Llama 3.3 70B Instruct-Turbo | Together.AI | $0.88 | $0.88 |
Methodology and Assumptions
1) Pricing data: Input and output token prices are taken from publicly available provider pricing pages as of December 2025. All values reflect list pricing only, enterprise and volume discounts are not included.
2) Derived costs: Derived costs show the total dollar cost to produce 1 million output tokens under fixed input-to-output ratios, calculated directly from published input and output prices.
3) Input-to-output ratios: The ratios shown (8:1, 4:1, 1:1, 1:4, 1:8) are used to hold workload shape constant. They are analytical lenses, not statements about how real workloads behave in practice.
4) Normalization: All costs are normalized to 1 million output tokens so pricing behavior can be compared consistently across models with different input and output price structures.
5) Scope: This analysis is limited to pricing mechanics. It does not account for model quality, reasoning depth, latency, throughput, safety, or system-level costs.
6) Model grouping: Models are grouped into Frontier and Optimized categories based on the KotaML tiering framework, which reflects pricing structure and positioning rather than capability.
7) Tier boundary note: Llama 3.x 70B Instruct-Turbo sits near the boundary between Frontier Value and Mid tiers and is included here for pricing continuity.