AI Model Pricing Comparison & Cost Calculator
Compare AI API pricing across 14+ providers. Use the interactive chart, sortable table, and cost calculator to find the best model for your budget. Includes capability tags, context window sizes, and direct links to official pricing pages.
Input & Output Prices ($/1M tokens)
Model Insights
Capabilities, performance, and metadata for every AI model
Embedding Models
Compare pricing for text embedding models used in RAG and vector search
8 models compared
Build My Stack
Select your use cases and get model recommendations with estimated costs
Select Your Use Cases
Choose the tasks you need AI for
Recommended Stack
Select use cases to see recommendations
Disclaimers
- Prices may vary based on enterprise agreements and volume discounts.
- Prices are subject to change without notice. Always check the official pricing pages of providers.
- Context lengths and capabilities may vary for different use cases and implementations.
- This information is provided for reference only and should not be considered financial advice.
- Prices are per 1M tokens unless stated otherwise.
- Last verified: March 2026
Understanding AI Pricing & Terminology
Everything you need to know about AI API costs, tokens, and how to optimize your spending
What are AI tokens?
Tokens are the basic units of text that AI models process. In English, 1 token is roughly 4 characters or ¾ of a word. A sentence like "Hello, how are you?" is about 6 tokens. Models count both input (your prompt) and output (the response) tokens separately.
How is AI API pricing calculated?
AI APIs charge per million tokens processed. Input tokens (your prompt) and output tokens (the model's response) are billed at different rates - output is typically 3-5x more expensive than input. For example, at $1/1M input and $5/1M output, a 500-token prompt with a 200-token reply costs $0.0015.
Input tokens vs output tokens explained
Input tokens include everything in your request: the system prompt, conversation history, and your current message. Output tokens are the model's generated response. Since you control your input (shorter prompts = lower cost), but can't always control output length, output cost is often the bigger variable in real-world usage.
How to reduce AI API costs
Use prompt caching (many providers charge 50-90% less for cached input). Choose smaller models for simple tasks - GPT-4o Mini or Claude Haiku cost 10-20x less than flagship models. Compress long context by summarizing history. Use streaming to show partial results faster. Batch non-urgent requests when batch pricing is available.
Cached input pricing explained
Some providers (OpenAI, Anthropic, Google) offer discounts when the same prefix appears in multiple requests. Cached input can cost 50-90% less. This is especially valuable for applications with a long system prompt that's repeated across many calls. Cache hits are only charged at the cached rate.
Context window and why it matters
The context window is the maximum amount of text (in tokens) a model can process in a single request - including both the input and output combined. A 200K context window can hold roughly 150,000 words, enough for an entire book. Larger contexts cost more but enable document analysis, long conversations, and multi-document reasoning.
AI Pricing FAQ
Common questions about AI API costs, model selection, and pricing comparisons