AI API PricingUpdated May 2026

AI API Pricing Comparison 2026

Side-by-side pricing for OpenAI (GPT-4.1, GPT-4o, o3), Anthropic (Claude 4), Google Gemini 2.5 Pro, Groq, and Mistral — updated for May 2026.

TL;DR — Key Takeaways

→Cheapest overall: Gemini 2.0 Flash ($0.075/1M input) or Groq Llama 4 Scout ($0.11/1M input)
→Best quality/cost: GPT-4o or Gemini 2.5 Pro — Gemini wins on price with 1M context window
→Prompt caching impact: Claude's 90% cache discount beats OpenAI's ~50% for long repeated contexts
→Free tiers: Google and Groq have the most generous free limits for prototyping

AI API Pricing Table — May 2026

All prices per million tokens unless noted. Prices are for standard (non-cached) usage. Verify against each provider's pricing page before production budgeting — prices change frequently.

Provider / Model	Tier	Input / 1M tokens	Output / 1M tokens	Context	Best For
GPT-4.1 OpenAI New	Frontier	$2.00	$8.00	1M tokens	Long-context tasks, coding
GPT-4o OpenAI Popular	Mid-tier	$2.50	$10.00	128K tokens	General purpose, vision, agents
GPT-4o Mini OpenAI Cheapest OpenAI	Budget	$0.15	$0.60	128K tokens	High-volume classification, routing
o3 OpenAI	Reasoning	$10.00	$40.00	200K tokens	Complex reasoning, math, science
Claude 4 Opus Anthropic	Frontier	$15.00	$75.00	200K tokens	Hardest reasoning tasks, research
Claude 4 Sonnet Anthropic Popular	Mid-tier	$3.00	$15.00	200K tokens	Production apps, writing, analysis
Claude 4 Haiku Anthropic	Budget	$0.80	$4.00	200K tokens	Real-time responses, high throughput
Claude 3.5 Sonnet Anthropic	Mid-tier	$3.00	$15.00	200K tokens	Code, analysis, writing
Gemini 2.5 Pro Google Best value frontier	Frontier	$1.25	$10.00	1M tokens	Very long documents, multimodal
Gemini 2.0 Flash Google Cheapest frontier-grade	Budget	$0.075	$0.30	1M tokens	Low-cost high-volume, fast latency
Llama 4 Scout Groq Fastest inference	Open source	$0.11	$0.34	128K tokens	Real-time apps, voice AI, speed
Llama 3.3 70B Groq	Open source	$0.59	$0.79	128K tokens	Quality open-source at low cost
Mistral Large Mistral	Mid-tier	$2.00	$6.00	128K tokens	European data residency, coding
Mistral Small Mistral	Budget	$0.10	$0.30	32K tokens	Low-cost European hosted inference

* Prices subject to change. Last verified May 2026. Claude 4 pricing is estimated based on Anthropic's published tier structure.

📡

Recommended

Monitor your services before your users notice

Try Better Stack Free →

Provider Pricing Deep Dives

OpenAI API Pricing↗ Status

OpenAI's pricing has become more competitive in 2026 with GPT-4.1 (launched April 2026) offering longer context at lower prices than GPT-4o. The o3 reasoning model targets complex multi-step problems but is significantly more expensive.

Billing model

Pay-per-token. No minimums. Prepay or postpay.

Free credits

$5 trial credits for new accounts (90-day expiry)

Prompt caching

~50% discount on cached prefix tokens (automatic)

Batch API

50% discount for async batch jobs (<24h turnaround)

Best cost tip

GPT-4o Mini for classification — $0.15/1M input, strong performance on simple tasks

Pricing page

platform.openai.com/docs/pricing

Anthropic Claude API Pricing↗ Status

Claude is generally priced slightly above OpenAI at each tier, but Anthropic's prompt caching (90% discount on cache hits vs. OpenAI's ~50%) makes it significantly cheaper for applications with long repeated system prompts or document contexts.

Billing model

Pay-per-token. Credit-based. Auto-recharge available.

Free tier

None — minimum $5 deposit to start

Prompt caching

90% discount on cache hits — best in class for long contexts

Rate limits

Tier 1–4 based on cumulative spend; auto-upgrades

Best cost tip

Use prompt caching on long system prompts. 90% savings on repeated context is massive at scale.

Pricing page

anthropic.com/pricing

Google Gemini API Pricing↗ Status

Gemini has become the most competitively priced frontier-grade API. Gemini 2.5 Pro matches or exceeds GPT-4o quality benchmarks at a lower price, with a 1M token context window. The free tier is the most generous in the industry.

Billing model

Pay-per-token via Google AI Studio or Vertex AI

Free tier

Gemini 2.0 Flash: 15 RPM / 1,500 RPD at $0 — best free tier available

Context caching

75% discount after 128K tokens (Gemini 1.5). Enterprise Vertex AI caching available.

Context window

1M tokens for Gemini 2.5 Pro and Gemini 1.5 Pro — unmatched for long documents

Best cost tip

Gemini 2.0 Flash at $0.075/1M input is cheaper than any comparable API. Use for classification + summarization at scale.

Pricing page

ai.google.dev/pricing

Groq API Pricing↗ Status

Groq is the cheapest and fastest option for open-source model inference. LPU (Language Processing Unit) hardware delivers 300–700+ tokens per second — 5–10x faster than GPU-hosted equivalents. The trade-off: open-source models only (Llama, Mixtral, Gemma).

Billing model

Pay-per-token. Free Developer tier available.

Free tier

1,000 RPD free for most models — excellent for development

Speed

300–700+ tokens/second (best-in-class for latency-sensitive apps)

OpenAI compat

Drop-in replacement — same request format, just change base URL

Best cost tip

Use as primary for real-time apps + OpenAI/Anthropic as fallback. 23x cheaper than GPT-4o.

Pricing page

console.groq.com/docs/openai#models

Prompt Caching Comparison

For production apps with long system prompts, RAG documents, or repeated tool definitions, prompt caching is the single biggest cost lever. Choosing the right provider for your caching pattern can cut costs 60–90%.

Anthropic Claude

90% on cache hits

Minimum tokens

1,024 tokens

TTL

5 minutes (ephemeral) or up to 1 hour

How to enable

cache_control: "ephemeral" on content blocks

OpenAI

~50% on cached tokens

Minimum tokens

1,024 tokens

TTL

5–10 minutes

How to enable

Automatic — OpenAI caches prefix tokens silently

Google Gemini

75% on cached tokens

Minimum tokens

128,000 tokens (Gemini 1.5)

TTL

Up to 1 hour (configurable)

How to enable

CachedContent API, then pass cachedContent name

Groq

No caching yet

Minimum tokens

N/A

TTL

N/A

How to enable

Cache at application layer (Redis)

Cost example: A RAG app with a 10K-token system prompt, running 100,000 requests/day. Without caching: 10K × 100K × $0.003/1K = $3,000/day just for the system prompt. With Anthropic cache (90% hit rate): $300/day. With OpenAI cache (50%): $1,500/day. That's a $63,000/month difference — just from choosing the right caching strategy.

How to Estimate Your AI API Costs

The formula is simple but the inputs matter. Here's how to estimate before you commit to a provider:

Step 1: Measure your typical token counts

// Use tiktoken (OpenAI) or the Anthropic token counter to measure
// For a typical chat app turn:
// - System prompt: 500-2,000 tokens
// - User message: 100-500 tokens
// - Assistant response: 200-1,000 tokens

// Monthly cost estimate:
const dailyRequests = 10_000;
const avgInputTokens = 1_500;    // system + context + user message
const avgOutputTokens = 400;

// At GPT-4o pricing:
const inputCost = (dailyRequests * avgInputTokens / 1_000_000) * 2.50;   // $0.0375/day
const outputCost = (dailyRequests * avgOutputTokens / 1_000_000) * 10.00; // $0.04/day
const monthlyCost = (inputCost + outputCost) * 30; // ~$2.33/month

// At Claude 4 Haiku:
const inputCostH = (dailyRequests * avgInputTokens / 1_000_000) * 0.80;   // $0.012/day
const outputCostH = (dailyRequests * avgOutputTokens / 1_000_000) * 4.00;  // $0.016/day
const monthlyCostH = (inputCostH + outputCostH) * 30; // ~$0.84/month

Step 2: Factor in output-heaviness

Output tokens cost 3–5x more than input tokens at most providers. If your app generates long responses (summaries, code, essays), output costs dominate. If it's mostly classification or extraction with short outputs, input costs dominate.

Classification / routing

Ratio: 10:1 input:output

→ Gemini 2.0 Flash or GPT-4o Mini

Document summarization

Ratio: 20:1 input:output

→ Gemini 2.5 Pro (1M context) or Anthropic with caching

Code generation

Ratio: 1:3 input:output

→ Claude 4 Haiku or GPT-4o Mini

Chat / conversation

Ratio: 2:1 input:output

→ GPT-4o or Claude 4 Sonnet

Real-time / voice AI

Ratio: 1:1 approx

→ Groq (speed > cost at this scale)

Agentic / tool use

Ratio: High input (tools)

→ Anthropic with prompt caching on tool defs

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time AI API goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for AI API + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Frequently Asked Questions

What is the cheapest AI API for production use in 2026?

For raw cost per token, Groq (Llama 4 Scout at ~$0.11/1M input) and Google Gemini 2.0 Flash ($0.075/1M input for ≤128K context) are the cheapest production-grade APIs. Among the frontier closed models, GPT-4o Mini ($0.15/1M input) and Claude 3.5 Haiku ($0.80/1M input) offer the best quality-to-cost ratio. The "cheapest" option depends on your use case — high-quality generation, classification tasks, and long-context document analysis each favor different providers.

How does OpenAI API pricing compare to Anthropic Claude pricing?

At the mid-tier (best value) models: GPT-4o costs $2.50/1M input + $10/1M output. Claude 4 Sonnet costs $3/1M input + $15/1M output. GPT-4o is slightly cheaper at the same capability tier. For the most capable models: o3 costs $10/1M input + $40/1M output vs. Claude 4 Opus at $15/1M input + $75/1M output. For budget models: GPT-4o Mini at $0.15/1M input vs. Claude 3.5 Haiku at $0.80/1M input — OpenAI is 5x cheaper at the budget tier. With prompt caching, Anthropic offers 90% cache hit discounts vs. OpenAI's ~50%.

Does Gemini API have a free tier?

Yes. Google Gemini API offers a free tier with generous limits: Gemini 2.0 Flash allows 15 requests per minute and 1,500 requests per day at no cost. Gemini 1.5 Pro also has a free tier with 2 RPM for low-volume use. The free tier is ideal for prototyping and low-traffic applications. Paid tier pricing starts at $0.075/1M input tokens for Gemini 2.0 Flash.

Is Groq cheaper than OpenAI?

Yes, significantly. Groq's Llama 4 Scout costs $0.11/1M input tokens vs. GPT-4o at $2.50/1M — Groq is roughly 23x cheaper per token. For Llama 3.3 70B (quality comparable to GPT-4o in many benchmarks), Groq charges $0.59/1M input vs. GPT-4o's $2.50/1M. The trade-off: Groq only offers open-source models (Llama, Mixtral), which may underperform frontier models on complex reasoning, coding, and instruction-following tasks.

What is prompt caching and how does it affect AI API costs?

Prompt caching lets you reuse repeated parts of your prompt (typically the system prompt or long document context) at dramatically reduced cost. Anthropic charges 10% of normal input price on cache hits (90% savings). OpenAI charges roughly 50% on cached prompt tokens. Google caches after 128K tokens at 75% discount. For applications with long, repeated system prompts or document analysis workflows, caching can reduce API costs by 60–90%.