API Rate Limiting: How to Handle 429 Errors in Production

by API Status Check Team

TL;DR

HTTP 429 errors signal rate limit violations. Handle them with exponential backoff, respect Retry-After headers, implement circuit breakers, and monitor rate limit metrics. Enterprise APIs often allow negotiating higher limits.

API Rate Limiting: How to Handle 429 Errors in Production

When your application hits an API rate limit, you'll receive an HTTP 429 Too Many Requests response. This seemingly simple error code can cascade into major production incidents if not handled correctly.

In this guide, we'll cover everything you need to know about API rate limiting, from understanding why it exists to implementing bulletproof retry strategies.

What Is API Rate Limiting?

API rate limiting is a traffic control mechanism that restricts how many requests a client can make to an API within a specific timeframe.

Common rate limit patterns:

  • Per-second limits: 100 requests/second
  • Per-minute limits: 1,000 requests/minute
  • Per-hour limits: 10,000 requests/hour
  • Daily quotas: 100,000 requests/day
  • Concurrent request limits: Maximum 10 simultaneous requests

Why APIs Use Rate Limiting

  1. Cost control — Prevent runaway infrastructure bills
  2. Fairness — Ensure all customers get reasonable access
  3. Security — Mitigate DDoS attacks and abuse
  4. Resource protection — Prevent server overload
  5. Billing tiers — Enforce plan limits (free vs paid)

Understanding HTTP 429 Responses

When you exceed a rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709726400
Content-Type: application/json

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Key Headers to Monitor

Header Purpose Example
Retry-After Seconds until retry allowed 60
X-RateLimit-Limit Total requests allowed in window 1000
X-RateLimit-Remaining Requests left in current window 0
X-RateLimit-Reset Unix timestamp when limit resets 1709726400

⚠️ Header names vary by provider. Check your API's documentation.

Production-Ready Retry Strategies

1. Exponential Backoff

The gold standard for handling 429s:

import time
import random

def api_call_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        
        if response.status_code == 200:
            return response.json()
        
        elif response.status_code == 429:
            # Start with 1 second, double each time
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            
            # Respect Retry-After if provided
            if 'Retry-After' in response.headers:
                wait_time = int(response.headers['Retry-After'])
            
            print(f"Rate limited. Waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        
        else:
            # Non-retryable error
            response.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} retries")

Why exponential backoff works:

  • Gives APIs time to recover
  • Prevents thundering herd problems
  • Backoff sequence: 1s → 2s → 4s → 8s → 16s

2. Respect Retry-After Headers

Some APIs tell you exactly when to retry:

async function fetchWithRetry(url, options = {}) {
  const maxRetries = 5;
  
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
    
    if (response.ok) {
      return await response.json();
    }
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After');
      const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
      
      console.log(`Rate limited. Retrying in ${waitMs / 1000}s...`);
      await new Promise(resolve => setTimeout(resolve, waitMs));
      continue;
    }
    
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }
  
  throw new Error(`Failed after ${maxRetries} retries`);
}

3. Implement Circuit Breakers

Prevent cascading failures when an API is persistently rate-limiting:

from enum import Enum
from datetime import datetime, timedelta

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, block requests
    HALF_OPEN = "half_open"  # Testing if recovered

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout  # seconds
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if datetime.now() - self.last_failure_time > timedelta(seconds=self.timeout):
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN. API temporarily unavailable.")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except RateLimitException:
            self.on_failure()
            raise
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = datetime.now()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=5, timeout=60)
response = breaker.call(make_api_request, "/endpoint")

Monitoring Rate Limit Health

Track These Metrics

  1. Rate limit remainingX-RateLimit-Remaining
    Alert when < 20% capacity remaining

  2. 429 error rate — Percentage of requests hitting limits
    Red flag if > 1% of total requests

  3. Time to rate limit resetX-RateLimit-Reset
    Plan request batching around reset windows

  4. Retry success rate — Are retries recovering?
    Low success rate = need higher tier or architecture changes

Example: Prometheus + Grafana Dashboard

from prometheus_client import Counter, Gauge, Histogram

# Metrics
api_requests_total = Counter('api_requests_total', 'Total API requests', ['status'])
rate_limit_remaining = Gauge('rate_limit_remaining', 'Requests remaining in window')
rate_limit_wait_time = Histogram('rate_limit_wait_seconds', 'Time spent waiting on rate limits')

def instrumented_api_call(url):
    response = requests.get(url)
    
    # Track request status
    api_requests_total.labels(status=response.status_code).inc()
    
    # Track rate limit state
    if 'X-RateLimit-Remaining' in response.headers:
        remaining = int(response.headers['X-RateLimit-Remaining'])
        rate_limit_remaining.set(remaining)
    
    # Handle 429
    if response.status_code == 429:
        wait_time = int(response.headers.get('Retry-After', 60))
        rate_limit_wait_time.observe(wait_time)
        time.sleep(wait_time)
        return instrumented_api_call(url)  # Retry
    
    return response.json()

Alert on:

  • rate_limit_remaining < 100 (running low)
  • rate_limit_wait_seconds > 10 (significant delays)
  • api_requests_total{status="429"} > 100/hour (frequent limiting)

Strategies to Avoid Rate Limits

1. Request Batching

Instead of:

# BAD: 1,000 individual requests
for user_id in user_ids:
    get_user(user_id)

Do:

# GOOD: 1 batch request
batch_size = 100
for i in range(0, len(user_ids), batch_size):
    batch = user_ids[i:i + batch_size]
    get_users_batch(batch)  # 10x fewer requests

2. Caching

from functools import lru_cache
import time

@lru_cache(maxsize=1000)
def get_user_cached(user_id, cache_timestamp):
    return api_get_user(user_id)

# Bust cache every 5 minutes
cache_key = int(time.time() / 300)
user = get_user_cached(user_id, cache_key)

3. Request Queuing

Use a job queue to smooth traffic spikes:

import asyncio
from asyncio import Queue

class RateLimitedClient:
    def __init__(self, rate_limit_per_second=10):
        self.rate_limit = rate_limit_per_second
        self.queue = Queue()
        self.running = False
    
    async def start(self):
        self.running = True
        while self.running:
            task = await self.queue.get()
            await task()
            await asyncio.sleep(1.0 / self.rate_limit)
    
    async def add_request(self, coro):
        await self.queue.put(coro)

# Usage
client = RateLimitedClient(rate_limit_per_second=10)
asyncio.create_task(client.start())

# Enqueue requests — they'll execute at controlled rate
await client.add_request(lambda: api_call("/endpoint1"))
await client.add_request(lambda: api_call("/endpoint2"))

4. Negotiate Higher Limits

Most enterprise APIs offer:

  • Tier upgrades — Pay more, get higher limits
  • Custom quotas — Negotiate based on use case
  • Reserved capacity — Guaranteed throughput during peaks

When to negotiate:

  • You're hitting limits frequently (>5% of requests)
  • Your use case is legitimate and business-critical
  • You're willing to move to a paid tier

Provider-Specific Rate Limit Guides

OpenAI API

  • Free tier: 3 requests/minute
  • Pay-as-you-go: 3,500 requests/minute (GPT-4)
  • Rate limits by model — Cheaper models have higher limits
  • Batch API — No rate limits, 50% cheaper, 24h processing time

Anthropic Claude API

  • Free tier: 50 requests/minute
  • Tier 1: 1,000 requests/minute (after $5 spend)
  • Tier 4: 10,000 requests/minute (after $10,000 spend)
  • Concurrent limits: 5 → 80 parallel requests by tier

Stripe API

  • Default: 100 requests/second
  • Per endpoint — Some endpoints have tighter limits
  • Idempotency keys — Safely retry failed payments
  • Test mode — Higher limits than live mode

GitHub API

  • Unauthenticated: 60 requests/hour
  • Authenticated: 5,000 requests/hour
  • GraphQL: Calculated by query complexity, not request count
  • Abuse detection — Can trigger secondary rate limits

Google Maps API

  • Per-second limits: Varies by API
  • Daily quotas: 25,000 free requests/day
  • Billing required — Must enable billing even for free tier
  • Premium plans — Up to 100 QPS and millions of requests/day

Real-World 429 Incident: Anthropic March 2026

On March 2-3, 2026, Anthropic experienced a major outage affecting Claude API users. While the web interface was down for 14 hours, the API remained functional — but hit rate limits under load.

What happened:

  • Authentication service failure
  • Increased API traffic as users shifted from web to API
  • Stricter enforcement of rate limits during recovery

Lessons:

  1. Multi-provider strategies — Services with OpenAI + Anthropic fallback survived
  2. API != Web — Different infrastructure means different failure modes
  3. Monitor rate limit headroom — Pre-outage traffic patterns matter

Read full analysis →

Best Practices Checklist

Implement exponential backoff with jitter
Always respect Retry-After headers
Track rate limit remaining in monitoring
Alert before hitting 0 remaining
Use circuit breakers for cascading failures
Batch requests when possible
Cache aggressively
Test retry logic in staging
Document your retry strategy
Have a fallback API provider

FAQs

What's the difference between 429 and 503?

429 Too Many Requests = You sent too many requests
503 Service Unavailable = The API is down

429 is client-side (your fault). 503 is server-side (their fault).

Should I retry 429 errors automatically?

Yes, with exponential backoff. Unlike 5xx errors (which may be transient), 429s are expected and retrying is the correct response.

How long should I wait before retrying?

  1. Check Retry-After header — use that value
  2. If no header, use exponential backoff: 1s, 2s, 4s, 8s, 16s
  3. Add random jitter (±20%) to prevent thundering herds

Can rate limits apply per endpoint?

Yes. Some APIs have:

  • Global limits — Total requests across all endpoints
  • Per-endpoint limits — POST /upload might have tighter limits than GET /data
  • Resource-based limits — Writes are stricter than reads

Always check the API documentation.

What if I need higher rate limits?

  1. Upgrade tier — Most APIs offer paid plans with higher limits
  2. Contact support — Explain your use case, request an increase
  3. Optimize requests — Batch, cache, and reduce unnecessary calls
  4. Dedicated infrastructure — Enterprise plans often include reserved capacity

Do rate limits reset immediately?

Depends on the algorithm:

  • Fixed window — Resets at exact time (e.g., top of the hour)
  • Sliding window — Resets gradually based on request timestamps
  • Token bucket — Refills at constant rate

Check X-RateLimit-Reset header for the next reset time.

Can I get banned for hitting rate limits?

Rarely. Most APIs treat 429s as normal traffic control. However:

  • Ignoring 429s and hammering the API = potential ban
  • Respecting backoff = no penalties
  • Persistent abuse = account suspension

How do I test rate limit handling?

# Artificially trigger 429s in tests
def test_rate_limit_retry():
    mock_responses = [
        Mock(status_code=429, headers={'Retry-After': '1'}),
        Mock(status_code=429, headers={'Retry-After': '2'}),
        Mock(status_code=200, json=lambda: {"data": "success"})
    ]
    
    with patch('requests.get', side_effect=mock_responses):
        result = api_call_with_retry("https://api.example.com/data")
        assert result == {"data": "success"}

Or use chaos engineering tools like Toxiproxy to inject 429 responses.

Monitoring API Rate Limits with API Status Check

Track rate limit health across all your API integrations in one dashboard:

Real-time rate limit remaining
Alert before hitting limits
429 error rate trends
Multi-provider fallback testing
Historical rate limit patterns

Start monitoring →


Need help monitoring your APIs? API Status Check tracks uptime, latency, and rate limit health for 400+ popular APIs. Get alerts before your users notice issues.

API Status Check

Stop checking API status pages manually

Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.

Get Alerts — $9/mo →

Free dashboard available · 14-day trial on paid plans · Cancel anytime

Browse Free Dashboard →