Notes/Model Tiers

Model Tier Framework

A structured framework for comparing LLMs based on reasoning ability, cost efficiency, and deployment control. Models are grouped into clear tiers to simplify architectural decisions and make tradeoffs easier to evaluate.

The three axes

Each tier is scored from 1–10 on three dimensions. Use these scores to align model classes with your product’s constraints.

Reasoning Depth: How well the model handles logic, nuance, multi-step inference, and correctness.
Cost Efficiency: Effective unit cost for production workloads; higher is cheaper at scale.
Deployment Control: How much control you have over hosting, tuning, privacy, and edge deployment.

Frontier High

Reasoning: 10 / 10
Cost: 1 / 10
Control: 2 / 10

Definition:

Peak reasoning models for high-stakes, complex tasks.

AIME Score¹:

45–60%

Use Case:

Scientific Reasoning
Medical Inference
Legal Analysis
Multi-step Agentic Planning

Example Models:

GPT-5 pro, Claude Opus 4.5, Gemini 3 Pro, DeepSeek-R1

Frontier Value

Reasoning: 8 / 10
Cost: 3 / 10
Control: 2 / 10

Definition:

Frontier models optimized for lower cost and throughput.

AIME Score¹:

25–40%

Use Case:

Customer support bots
High-quality Assistants
Knowledge-retrieval Agents
Email/workflow Copilots

Example Models:

GPT-5 mini, Claude Sonnet 4.5, DeepSeek-V3, Mixtral 8x22B

Mid High

Reasoning: 7 / 10
Cost: 5 / 10
Control: 3 / 10

Definition:

Balanced proprietary models for solid quality at mid cost.

AIME Score¹:

15–25%

Use Case:

Extraction Tools
Documentation Copilots
Coding Helpers
Internal Productivity Bots

Example Models:

Gemini 2.5 Flash, Llama 4 Maverick, Qwen 2.5 72B, Kimi K2 Thinking

Mid Value

Reasoning: 5 / 10
Cost: 8 / 10
Control: 2 / 10

Definition:

Fast proprietary models for cheap, high-volume workloads.

AIME Score¹:

5–15%

Use Case:

High-volume Classification
Email Triage
Routing
Lightweight Summarization

Example Models:

Llama 3.1 8B Instruct Turbo, Llama 4 Scout, Mistral Small 3, Qwen2.5 7B Instruct Turbo

OSS High

Reasoning: 7 / 10
Cost: 7 / 10
Control: 9 / 10

Definition:

Large open models with strong quality and full control.

AIME Score¹:

10–20%

Use Case:

Self-hosted Research Assistants
Domain-tuned Tools
Privacy-sensitive Q&A
Knowledge Copilots

Example Models:

Llama 3 70B Instruct Reference, Mistral 7B Instruct, Qwen 3 30B, GLM-4.5 Air

OSS Value

Reasoning: 4 / 10
Cost: 10 / 10
Control: 10 / 10

Definition:

Small open models for local, very low-cost inference.

AIME Score¹:

0–8%

Use Case:

Local Chatbots
Embeddings and Search
Simple Extraction
On-device assistants

Example Models:

gemma-3n-E4B-it, Llama 3.2 3B Instruct Turbo, gpt-oss-20b

AIME¹: 15 challenging multi-step math and logic problems used here as a proxy for reasoning depth.