KotaML Logo
KotaML
Notes/Model Tiers

Model Tier Framework

A structured framework for comparing LLMs based on reasoning ability, cost efficiency, and deployment control. Models are grouped into clear tiers to simplify architectural decisions and make tradeoffs easier to evaluate.

The three axes

Each tier is scored from 1–10 on three dimensions. Use these scores to align model classes with your product’s constraints.

Reasoning Depth
How well the model handles logic, nuance, multi-step inference, and correctness.

1 = shallow, 10 = frontier-level reasoning.

Cost Efficiency
Effective unit cost for production workloads; higher is cheaper at scale.

1 = extremely expensive, 10 = extremely cheap.

Deployment Control
How much control you have over hosting, tuning, privacy, and edge deployment.

1 = fully closed, 10 = fully self-hostable.

Frontier High

Reasoning
10 / 10
Cost
1 / 10
Control
2 / 10
Definition:

Peak reasoning models for high-stakes, complex tasks.

AIME Score1:

45–60%

Use Case:
  • Scientific Reasoning
  • Medical Inference
  • Legal Analysis
  • Multi-step Agentic Planning
Example Models:

GPT-5 pro, Claude Opus 4.5, Gemini 3 Pro, DeepSeek-R1

Frontier Value

Reasoning
8 / 10
Cost
3 / 10
Control
2 / 10
Definition:

Frontier models optimized for lower cost and throughput.

AIME Score1:

25–40%

Use Case:
  • Customer support bots
  • High-quality Assistants
  • Knowledge-retrieval Agents
  • Email/workflow Copilots
Example Models:

GPT-5 mini, Claude Sonnet 4.5, DeepSeek-V3, Mixtral 8x22B

Mid High

Reasoning
7 / 10
Cost
5 / 10
Control
3 / 10
Definition:

Balanced proprietary models for solid quality at mid cost.

AIME Score1:

15–25%

Use Case:
  • Extraction Tools
  • Documentation Copilots
  • Coding Helpers
  • Internal Productivity Bots
Example Models:

Gemini 2.5 Flash, Llama 4 Maverick, Qwen 2.5 72B, Kimi K2 Thinking

Mid Value

Reasoning
5 / 10
Cost
8 / 10
Control
2 / 10
Definition:

Fast proprietary models for cheap, high-volume workloads.

AIME Score1:

5–15%

Use Case:
  • High-volume Classification
  • Email Triage
  • Routing
  • Lightweight Summarization
Example Models:

Llama 3.1 8B Instruct Turbo, Llama 4 Scout, Mistral Small 3, Qwen2.5 7B Instruct Turbo

OSS High

Reasoning
7 / 10
Cost
7 / 10
Control
9 / 10
Definition:

Large open models with strong quality and full control.

AIME Score1:

10–20%

Use Case:
  • Self-hosted Research Assistants
  • Domain-tuned Tools
  • Privacy-sensitive Q&A
  • Knowledge Copilots
Example Models:

Llama 3 70B Instruct Reference, Mistral 7B Instruct, Qwen 3 30B, GLM-4.5 Air

OSS Value

Reasoning
4 / 10
Cost
10 / 10
Control
10 / 10
Definition:

Small open models for local, very low-cost inference.

AIME Score1:

0–8%

Use Case:
  • Local Chatbots
  • Embeddings and Search
  • Simple Extraction
  • On-device assistants
Example Models:

gemma-3n-E4B-it, Llama 3.2 3B Instruct Turbo, gpt-oss-20b

AIME1: 15 challenging multi-step math and logic problems used here as a proxy for reasoning depth.