AI Model Pricing Comparison & Cost Calculator

Compare AI API pricing across 14+ providers. Use the interactive chart, sortable table, and cost calculator to find the best model for your budget. Includes capability tags, context window sizes, and direct links to official pricing pages.

14 providers tracked
175 models tracked
Updated Apr 2026
All prices in USD per 1M tokens
Coding Tools

Input & Output Prices ($/1M tokens)

28/175 models
28 models

Model Insights

Capabilities, performance, and metadata for every AI model

OpenAI
GPT-5.4 Nano
400K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
budget
In: $0.20Out: $1.25per 1M
OpenAI
GPT-5.4 Mini
400K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
budget
In: $0.75Out: $4.50per 1M
OpenAI
New
GPT-5.4 Pro
1.05M ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
premiumAgentic coding and long-context multi-step workflows
In: $30.00Out: $180.00per 1M
OpenAI
New
GPT-5.4
1.05M ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
premium
In: $2.50Out: $15.00per 1M
OpenAI
New
GPT-5.3 Chat
128K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingLong Context
balancedSmooth everyday conversations with high accuracy
In: $1.75Out: $14.00per 1M
OpenAI
GPT-5.3-Codex
400K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
balanced
In: $1.75Out: $14.00per 1M
OpenAI
GPT-5.2-Codex
400K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
balanced
In: $1.75Out: $14.00per 1M
OpenAI
GPT-5.2 Chat
128K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingLong Context
balanced
In: $1.75Out: $14.00per 1M
OpenAI
GPT-5.2 Pro
400K ctxUpdated Apr 2026
VisionFunctionsJSONStreamingReasoning
premiumAgentic coding and long context performance
In: $21.00Out: $168.00per 1M
1-9 of 175 models

Embedding Models

Compare pricing for text embedding models used in RAG and vector search

8 models compared

Link
Gemini Embedding
Google
Free
7682,048
text-embedding-3-small
OpenAI
$0.021,5368,191
Titan Embeddings V2
Amazon
$0.021,0248,192
Voyage 3
Anthropic
$0.061,02432,000
Embed English v3.0
Cohere
$0.101,024512
Embed Multilingual v3.0
Cohere
$0.101,024512
Mistral Embed
Mistral
$0.101,0248,192
text-embedding-3-large
OpenAI
$0.133,0728,191

Build My Stack

Select your use cases and get model recommendations with estimated costs

Select Your Use Cases

Choose the tasks you need AI for

Recommended Stack

Select use cases to see recommendations

Select use cases on the left to see model recommendations

Disclaimers

  • Prices may vary based on enterprise agreements and volume discounts.
  • Prices are subject to change without notice. Always check the official pricing pages of providers.
  • Context lengths and capabilities may vary for different use cases and implementations.
  • This information is provided for reference only and should not be considered financial advice.
  • Prices are per 1M tokens unless stated otherwise.
  • Last verified: March 2026

Understanding AI Pricing & Terminology

Everything you need to know about AI API costs, tokens, and how to optimize your spending

What are AI tokens?

Tokens are the basic units of text that AI models process. In English, 1 token is roughly 4 characters or ¾ of a word. A sentence like "Hello, how are you?" is about 6 tokens. Models count both input (your prompt) and output (the response) tokens separately.

How is AI API pricing calculated?

AI APIs charge per million tokens processed. Input tokens (your prompt) and output tokens (the model's response) are billed at different rates - output is typically 3-5x more expensive than input. For example, at $1/1M input and $5/1M output, a 500-token prompt with a 200-token reply costs $0.0015.

Input tokens vs output tokens explained

Input tokens include everything in your request: the system prompt, conversation history, and your current message. Output tokens are the model's generated response. Since you control your input (shorter prompts = lower cost), but can't always control output length, output cost is often the bigger variable in real-world usage.

How to reduce AI API costs

Use prompt caching (many providers charge 50-90% less for cached input). Choose smaller models for simple tasks - GPT-4o Mini or Claude Haiku cost 10-20x less than flagship models. Compress long context by summarizing history. Use streaming to show partial results faster. Batch non-urgent requests when batch pricing is available.

Cached input pricing explained

Some providers (OpenAI, Anthropic, Google) offer discounts when the same prefix appears in multiple requests. Cached input can cost 50-90% less. This is especially valuable for applications with a long system prompt that's repeated across many calls. Cache hits are only charged at the cached rate.

Context window and why it matters

The context window is the maximum amount of text (in tokens) a model can process in a single request - including both the input and output combined. A 200K context window can hold roughly 150,000 words, enough for an entire book. Larger contexts cost more but enable document analysis, long conversations, and multi-document reasoning.

AI Pricing FAQ

Common questions about AI API costs, model selection, and pricing comparisons