API Rate Limiting: How to Handle 429 Errors in Production
TL;DR
HTTP 429 errors signal rate limit violations. Handle them with exponential backoff, respect Retry-After headers, implement circuit breakers, and monitor rate limit metrics. Enterprise APIs often allow negotiating higher limits.
API Rate Limiting: How to Handle 429 Errors in Production
When your application hits an API rate limit, you'll receive an HTTP 429 Too Many Requests response. This seemingly simple error code can cascade into major production incidents if not handled correctly.
In this guide, we'll cover everything you need to know about API rate limiting, from understanding why it exists to implementing bulletproof retry strategies.
What Is API Rate Limiting?
API rate limiting is a traffic control mechanism that restricts how many requests a client can make to an API within a specific timeframe.
Common rate limit patterns:
- Per-second limits: 100 requests/second
- Per-minute limits: 1,000 requests/minute
- Per-hour limits: 10,000 requests/hour
- Daily quotas: 100,000 requests/day
- Concurrent request limits: Maximum 10 simultaneous requests
Why APIs Use Rate Limiting
- Cost control — Prevent runaway infrastructure bills
- Fairness — Ensure all customers get reasonable access
- Security — Mitigate DDoS attacks and abuse
- Resource protection — Prevent server overload
- Billing tiers — Enforce plan limits (free vs paid)
Understanding HTTP 429 Responses
When you exceed a rate limit, the API returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709726400
Content-Type: application/json
{
"error": {
"message": "Rate limit exceeded. Please retry after 60 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
Key Headers to Monitor
| Header | Purpose | Example |
|---|---|---|
Retry-After |
Seconds until retry allowed | 60 |
X-RateLimit-Limit |
Total requests allowed in window | 1000 |
X-RateLimit-Remaining |
Requests left in current window | 0 |
X-RateLimit-Reset |
Unix timestamp when limit resets | 1709726400 |
⚠️ Header names vary by provider. Check your API's documentation.
Production-Ready Retry Strategies
1. Exponential Backoff
The gold standard for handling 429s:
import time
import random
def api_call_with_retry(url, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Start with 1 second, double each time
wait_time = (2 ** attempt) + random.uniform(0, 1)
# Respect Retry-After if provided
if 'Retry-After' in response.headers:
wait_time = int(response.headers['Retry-After'])
print(f"Rate limited. Waiting {wait_time:.1f}s...")
time.sleep(wait_time)
else:
# Non-retryable error
response.raise_for_status()
raise Exception(f"Failed after {max_retries} retries")
Why exponential backoff works:
- Gives APIs time to recover
- Prevents thundering herd problems
- Backoff sequence: 1s → 2s → 4s → 8s → 16s
2. Respect Retry-After Headers
Some APIs tell you exactly when to retry:
async function fetchWithRetry(url, options = {}) {
const maxRetries = 5;
for (let i = 0; i < maxRetries; i++) {
const response = await fetch(url, options);
if (response.ok) {
return await response.json();
}
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
console.log(`Rate limited. Retrying in ${waitMs / 1000}s...`);
await new Promise(resolve => setTimeout(resolve, waitMs));
continue;
}
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
throw new Error(`Failed after ${maxRetries} retries`);
}
3. Implement Circuit Breakers
Prevent cascading failures when an API is persistently rate-limiting:
from enum import Enum
from datetime import datetime, timedelta
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing, block requests
HALF_OPEN = "half_open" # Testing if recovered
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.timeout = timeout # seconds
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if datetime.now() - self.last_failure_time > timedelta(seconds=self.timeout):
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN. API temporarily unavailable.")
try:
result = func(*args, **kwargs)
self.on_success()
return result
except RateLimitException:
self.on_failure()
raise
def on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def on_failure(self):
self.failure_count += 1
self.last_failure_time = datetime.now()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Usage
breaker = CircuitBreaker(failure_threshold=5, timeout=60)
response = breaker.call(make_api_request, "/endpoint")
Monitoring Rate Limit Health
Track These Metrics
Rate limit remaining —
X-RateLimit-Remaining
Alert when < 20% capacity remaining429 error rate — Percentage of requests hitting limits
Red flag if > 1% of total requestsTime to rate limit reset —
X-RateLimit-Reset
Plan request batching around reset windowsRetry success rate — Are retries recovering?
Low success rate = need higher tier or architecture changes
Example: Prometheus + Grafana Dashboard
from prometheus_client import Counter, Gauge, Histogram
# Metrics
api_requests_total = Counter('api_requests_total', 'Total API requests', ['status'])
rate_limit_remaining = Gauge('rate_limit_remaining', 'Requests remaining in window')
rate_limit_wait_time = Histogram('rate_limit_wait_seconds', 'Time spent waiting on rate limits')
def instrumented_api_call(url):
response = requests.get(url)
# Track request status
api_requests_total.labels(status=response.status_code).inc()
# Track rate limit state
if 'X-RateLimit-Remaining' in response.headers:
remaining = int(response.headers['X-RateLimit-Remaining'])
rate_limit_remaining.set(remaining)
# Handle 429
if response.status_code == 429:
wait_time = int(response.headers.get('Retry-After', 60))
rate_limit_wait_time.observe(wait_time)
time.sleep(wait_time)
return instrumented_api_call(url) # Retry
return response.json()
Alert on:
rate_limit_remaining < 100(running low)rate_limit_wait_seconds > 10(significant delays)api_requests_total{status="429"} > 100/hour(frequent limiting)
Strategies to Avoid Rate Limits
1. Request Batching
Instead of:
# BAD: 1,000 individual requests
for user_id in user_ids:
get_user(user_id)
Do:
# GOOD: 1 batch request
batch_size = 100
for i in range(0, len(user_ids), batch_size):
batch = user_ids[i:i + batch_size]
get_users_batch(batch) # 10x fewer requests
2. Caching
from functools import lru_cache
import time
@lru_cache(maxsize=1000)
def get_user_cached(user_id, cache_timestamp):
return api_get_user(user_id)
# Bust cache every 5 minutes
cache_key = int(time.time() / 300)
user = get_user_cached(user_id, cache_key)
3. Request Queuing
Use a job queue to smooth traffic spikes:
import asyncio
from asyncio import Queue
class RateLimitedClient:
def __init__(self, rate_limit_per_second=10):
self.rate_limit = rate_limit_per_second
self.queue = Queue()
self.running = False
async def start(self):
self.running = True
while self.running:
task = await self.queue.get()
await task()
await asyncio.sleep(1.0 / self.rate_limit)
async def add_request(self, coro):
await self.queue.put(coro)
# Usage
client = RateLimitedClient(rate_limit_per_second=10)
asyncio.create_task(client.start())
# Enqueue requests — they'll execute at controlled rate
await client.add_request(lambda: api_call("/endpoint1"))
await client.add_request(lambda: api_call("/endpoint2"))
4. Negotiate Higher Limits
Most enterprise APIs offer:
- Tier upgrades — Pay more, get higher limits
- Custom quotas — Negotiate based on use case
- Reserved capacity — Guaranteed throughput during peaks
When to negotiate:
- You're hitting limits frequently (>5% of requests)
- Your use case is legitimate and business-critical
- You're willing to move to a paid tier
Provider-Specific Rate Limit Guides
OpenAI API
- Free tier: 3 requests/minute
- Pay-as-you-go: 3,500 requests/minute (GPT-4)
- Rate limits by model — Cheaper models have higher limits
- Batch API — No rate limits, 50% cheaper, 24h processing time
Anthropic Claude API
- Free tier: 50 requests/minute
- Tier 1: 1,000 requests/minute (after $5 spend)
- Tier 4: 10,000 requests/minute (after $10,000 spend)
- Concurrent limits: 5 → 80 parallel requests by tier
Stripe API
- Default: 100 requests/second
- Per endpoint — Some endpoints have tighter limits
- Idempotency keys — Safely retry failed payments
- Test mode — Higher limits than live mode
GitHub API
- Unauthenticated: 60 requests/hour
- Authenticated: 5,000 requests/hour
- GraphQL: Calculated by query complexity, not request count
- Abuse detection — Can trigger secondary rate limits
Google Maps API
- Per-second limits: Varies by API
- Daily quotas: 25,000 free requests/day
- Billing required — Must enable billing even for free tier
- Premium plans — Up to 100 QPS and millions of requests/day
Real-World 429 Incident: Anthropic March 2026
On March 2-3, 2026, Anthropic experienced a major outage affecting Claude API users. While the web interface was down for 14 hours, the API remained functional — but hit rate limits under load.
What happened:
- Authentication service failure
- Increased API traffic as users shifted from web to API
- Stricter enforcement of rate limits during recovery
Lessons:
- Multi-provider strategies — Services with OpenAI + Anthropic fallback survived
- API != Web — Different infrastructure means different failure modes
- Monitor rate limit headroom — Pre-outage traffic patterns matter
Best Practices Checklist
✅ Implement exponential backoff with jitter
✅ Always respect Retry-After headers
✅ Track rate limit remaining in monitoring
✅ Alert before hitting 0 remaining
✅ Use circuit breakers for cascading failures
✅ Batch requests when possible
✅ Cache aggressively
✅ Test retry logic in staging
✅ Document your retry strategy
✅ Have a fallback API provider
FAQs
What's the difference between 429 and 503?
429 Too Many Requests = You sent too many requests
503 Service Unavailable = The API is down
429 is client-side (your fault). 503 is server-side (their fault).
Should I retry 429 errors automatically?
Yes, with exponential backoff. Unlike 5xx errors (which may be transient), 429s are expected and retrying is the correct response.
How long should I wait before retrying?
- Check
Retry-Afterheader — use that value - If no header, use exponential backoff: 1s, 2s, 4s, 8s, 16s
- Add random jitter (±20%) to prevent thundering herds
Can rate limits apply per endpoint?
Yes. Some APIs have:
- Global limits — Total requests across all endpoints
- Per-endpoint limits — POST /upload might have tighter limits than GET /data
- Resource-based limits — Writes are stricter than reads
Always check the API documentation.
What if I need higher rate limits?
- Upgrade tier — Most APIs offer paid plans with higher limits
- Contact support — Explain your use case, request an increase
- Optimize requests — Batch, cache, and reduce unnecessary calls
- Dedicated infrastructure — Enterprise plans often include reserved capacity
Do rate limits reset immediately?
Depends on the algorithm:
- Fixed window — Resets at exact time (e.g., top of the hour)
- Sliding window — Resets gradually based on request timestamps
- Token bucket — Refills at constant rate
Check X-RateLimit-Reset header for the next reset time.
Can I get banned for hitting rate limits?
Rarely. Most APIs treat 429s as normal traffic control. However:
- Ignoring 429s and hammering the API = potential ban
- Respecting backoff = no penalties
- Persistent abuse = account suspension
How do I test rate limit handling?
# Artificially trigger 429s in tests
def test_rate_limit_retry():
mock_responses = [
Mock(status_code=429, headers={'Retry-After': '1'}),
Mock(status_code=429, headers={'Retry-After': '2'}),
Mock(status_code=200, json=lambda: {"data": "success"})
]
with patch('requests.get', side_effect=mock_responses):
result = api_call_with_retry("https://api.example.com/data")
assert result == {"data": "success"}
Or use chaos engineering tools like Toxiproxy to inject 429 responses.
Monitoring API Rate Limits with API Status Check
Track rate limit health across all your API integrations in one dashboard:
✅ Real-time rate limit remaining
✅ Alert before hitting limits
✅ 429 error rate trends
✅ Multi-provider fallback testing
✅ Historical rate limit patterns
Need help monitoring your APIs? API Status Check tracks uptime, latency, and rate limit health for 400+ popular APIs. Get alerts before your users notice issues.
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
Free dashboard available · 14-day trial on paid plans · Cancel anytime
Browse Free Dashboard →