Where can I monitor API status in real-time?

API Status Check (apistatuscheck.com) provides real-time monitoring for 100+ APIs with uptime tracking and alerts. You can view dashboards, subscribe to feeds, and set up notifications in minutes.

API Rate Limiting: How to Handle 429 Errors in Production

Q: API Rate Limiting: How to Handle 429 Errors in Production?

This post explains API Rate Limiting: How to Handle 429 Errors in Production with clear steps and practical examples. Use the guidance to apply the recommendations in your own API workflows.

When your application hits an API rate limit, you'll receive an HTTP 429 Too Many Requests response. This seemingly simple error code can cascade into major production incidents if not handled correctly.

In this guide, we'll cover everything you need to know about API rate limiting, from understanding why it exists to implementing bulletproof retry strategies.

What Is API Rate Limiting?

API rate limiting is a traffic control mechanism that restricts how many requests a client can make to an API within a specific timeframe.

Common rate limit patterns:

Per-second limits: 100 requests/second
Per-minute limits: 1,000 requests/minute
Per-hour limits: 10,000 requests/hour
Daily quotas: 100,000 requests/day
Concurrent request limits: Maximum 10 simultaneous requests

Why APIs Use Rate Limiting

Cost control — Prevent runaway infrastructure bills
Fairness — Ensure all customers get reasonable access
Security — Mitigate DDoS attacks and abuse
Resource protection — Prevent server overload
Billing tiers — Enforce plan limits (free vs paid)

📡 Put this into practice — start monitoring your APIs now. Better Stack checks your endpoints every 30 seconds with instant alerts via Slack, email, and SMS. Free tier available — no credit card required.

Understanding HTTP 429 Responses

When you exceed a rate limit, the API returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709726400
Content-Type: application/json

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Key Headers to Monitor

Header	Purpose	Example
`Retry-After`	Seconds until retry allowed	`60`
`X-RateLimit-Limit`	Total requests allowed in window	`1000`
`X-RateLimit-Remaining`	Requests left in current window	`0`
`X-RateLimit-Reset`	Unix timestamp when limit resets	`1709726400`

⚠️ Header names vary by provider. Check your API's documentation.

Production-Ready Retry Strategies

1. Exponential Backoff

The gold standard for handling 429s:

import time
import random

def api_call_with_retry(url, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url)
        
        if response.status_code == 200:
            return response.json()
        
        elif response.status_code == 429:
            # Start with 1 second, double each time
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            
            # Respect Retry-After if provided
            if 'Retry-After' in response.headers:
                wait_time = int(response.headers['Retry-After'])
            
            print(f"Rate limited. Waiting {wait_time:.1f}s...")
            time.sleep(wait_time)
        
        else:
            # Non-retryable error
            response.raise_for_status()
    
    raise Exception(f"Failed after {max_retries} retries")

Why exponential backoff works:

Gives APIs time to recover
Prevents thundering herd problems
Backoff sequence: 1s → 2s → 4s → 8s → 16s

2. Respect Retry-After Headers

Some APIs tell you exactly when to retry:

async function fetchWithRetry(url, options = {}) {
  const maxRetries = 5;
  
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);
    
    if (response.ok) {
      return await response.json();
    }
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After');
      const waitMs = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, i) * 1000;
      
      console.log(`Rate limited. Retrying in ${waitMs / 1000}s...`);
      await new Promise(resolve => setTimeout(resolve, waitMs));
      continue;
    }
    
    throw new Error(`HTTP ${response.status}: ${response.statusText}`);
  }
  
  throw new Error(`Failed after ${maxRetries} retries`);
}

3. Implement Circuit Breakers

Prevent cascading failures when an API is persistently rate-limiting:

from enum import Enum
from datetime import datetime, timedelta

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, block requests
    HALF_OPEN = "half_open"  # Testing if recovered

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout  # seconds
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if datetime.now() - self.last_failure_time > timedelta(seconds=self.timeout):
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN. API temporarily unavailable.")
        
        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except RateLimitException:
            self.on_failure()
            raise
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = datetime.now()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=5, timeout=60)
response = breaker.call(make_api_request, "/endpoint")

Monitoring Rate Limit Health

Track These Metrics

Rate limit remaining — X-RateLimit-Remaining
Alert when < 20% capacity remaining
429 error rate — Percentage of requests hitting limits
Red flag if > 1% of total requests
Time to rate limit reset — X-RateLimit-Reset
Plan request batching around reset windows
Retry success rate — Are retries recovering?
Low success rate = need higher tier or architecture changes

Example: Prometheus + Grafana Dashboard

from prometheus_client import Counter, Gauge, Histogram

# Metrics
api_requests_total = Counter('api_requests_total', 'Total API requests', ['status'])
rate_limit_remaining = Gauge('rate_limit_remaining', 'Requests remaining in window')
rate_limit_wait_time = Histogram('rate_limit_wait_seconds', 'Time spent waiting on rate limits')

def instrumented_api_call(url):
    response = requests.get(url)
    
    # Track request status
    api_requests_total.labels(status=response.status_code).inc()
    
    # Track rate limit state
    if 'X-RateLimit-Remaining' in response.headers:
        remaining = int(response.headers['X-RateLimit-Remaining'])
        rate_limit_remaining.set(remaining)
    
    # Handle 429
    if response.status_code == 429:
        wait_time = int(response.headers.get('Retry-After', 60))
        rate_limit_wait_time.observe(wait_time)
        time.sleep(wait_time)
        return instrumented_api_call(url)  # Retry
    
    return response.json()

Alert on:

rate_limit_remaining < 100 (running low)
rate_limit_wait_seconds > 10 (significant delays)
api_requests_total{status="429"} > 100/hour (frequent limiting)

Strategies to Avoid Rate Limits

1. Request Batching

Instead of:

# BAD: 1,000 individual requests
for user_id in user_ids:
    get_user(user_id)

Do:

# GOOD: 1 batch request
batch_size = 100
for i in range(0, len(user_ids), batch_size):
    batch = user_ids[i:i + batch_size]
    get_users_batch(batch)  # 10x fewer requests

2. Caching

from functools import lru_cache
import time

@lru_cache(maxsize=1000)
def get_user_cached(user_id, cache_timestamp):
    return api_get_user(user_id)

# Bust cache every 5 minutes
cache_key = int(time.time() / 300)
user = get_user_cached(user_id, cache_key)

3. Request Queuing

Use a job queue to smooth traffic spikes:

import asyncio
from asyncio import Queue

class RateLimitedClient:
    def __init__(self, rate_limit_per_second=10):
        self.rate_limit = rate_limit_per_second
        self.queue = Queue()
        self.running = False
    
    async def start(self):
        self.running = True
        while self.running:
            task = await self.queue.get()
            await task()
            await asyncio.sleep(1.0 / self.rate_limit)
    
    async def add_request(self, coro):
        await self.queue.put(coro)

# Usage
client = RateLimitedClient(rate_limit_per_second=10)
asyncio.create_task(client.start())

# Enqueue requests — they'll execute at controlled rate
await client.add_request(lambda: api_call("/endpoint1"))
await client.add_request(lambda: api_call("/endpoint2"))

4. Negotiate Higher Limits

Most enterprise APIs offer:

Tier upgrades — Pay more, get higher limits
Custom quotas — Negotiate based on use case
Reserved capacity — Guaranteed throughput during peaks

When to negotiate:

You're hitting limits frequently (>5% of requests)
Your use case is legitimate and business-critical
You're willing to move to a paid tier

Provider-Specific Rate Limit Guides

OpenAI API

Free tier: 3 requests/minute
Pay-as-you-go: 3,500 requests/minute (GPT-4)
Rate limits by model — Cheaper models have higher limits
Batch API — No rate limits, 50% cheaper, 24h processing time

Anthropic Claude API

Free tier: 50 requests/minute
Tier 1: 1,000 requests/minute (after $5 spend)
Tier 4: 10,000 requests/minute (after $10,000 spend)
Concurrent limits: 5 → 80 parallel requests by tier

Stripe API

Default: 100 requests/second
Per endpoint — Some endpoints have tighter limits
Idempotency keys — Safely retry failed payments
Test mode — Higher limits than live mode

GitHub API

Unauthenticated: 60 requests/hour
Authenticated: 5,000 requests/hour
GraphQL: Calculated by query complexity, not request count
Abuse detection — Can trigger secondary rate limits

Google Maps API

Per-second limits: Varies by API
Daily quotas: 25,000 free requests/day
Billing required — Must enable billing even for free tier
Premium plans — Up to 100 QPS and millions of requests/day

🔐 API keys scattered across .env files and Slack DMs? 1Password securely stores and shares API tokens, environment variables, and service credentials across your team — with audit logs and rotation reminders.

Real-World 429 Incident: Anthropic March 2026

On March 2-3, 2026, Anthropic experienced a major outage affecting Claude API users. While the web interface was down for 14 hours, the API remained functional — but hit rate limits under load.

What happened:

Authentication service failure
Increased API traffic as users shifted from web to API
Stricter enforcement of rate limits during recovery

Lessons:

Multi-provider strategies — Services with OpenAI + Anthropic fallback survived
API != Web — Different infrastructure means different failure modes
Monitor rate limit headroom — Pre-outage traffic patterns matter

Read full analysis →

Best Practices Checklist

✅ Implement exponential backoff with jitter
✅ Always respect Retry-After headers
✅ Track rate limit remaining in monitoring
✅ Alert before hitting 0 remaining
✅ Use circuit breakers for cascading failures
✅ Batch requests when possible
✅ Cache aggressively
✅ Test retry logic in staging
✅ Document your retry strategy
✅ Have a fallback API provider

FAQs

What's the difference between 429 and 503?

429 Too Many Requests = You sent too many requests
503 Service Unavailable = The API is down

429 is client-side (your fault). 503 is server-side (their fault).

Should I retry 429 errors automatically?

Yes, with exponential backoff. Unlike 5xx errors (which may be transient), 429s are expected and retrying is the correct response.

How long should I wait before retrying?

Check Retry-After header — use that value
If no header, use exponential backoff: 1s, 2s, 4s, 8s, 16s
Add random jitter (±20%) to prevent thundering herds

Can rate limits apply per endpoint?

Yes. Some APIs have:

Global limits — Total requests across all endpoints
Per-endpoint limits — POST /upload might have tighter limits than GET /data
Resource-based limits — Writes are stricter than reads

Always check the API documentation.

What if I need higher rate limits?

Upgrade tier — Most APIs offer paid plans with higher limits
Contact support — Explain your use case, request an increase
Optimize requests — Batch, cache, and reduce unnecessary calls
Dedicated infrastructure — Enterprise plans often include reserved capacity

Do rate limits reset immediately?

Depends on the algorithm:

Fixed window — Resets at exact time (e.g., top of the hour)
Sliding window — Resets gradually based on request timestamps
Token bucket — Refills at constant rate

Check X-RateLimit-Reset header for the next reset time.

Can I get banned for hitting rate limits?

Rarely. Most APIs treat 429s as normal traffic control. However:

Ignoring 429s and hammering the API = potential ban
Respecting backoff = no penalties
Persistent abuse = account suspension

How do I test rate limit handling?

# Artificially trigger 429s in tests
def test_rate_limit_retry():
    mock_responses = [
        Mock(status_code=429, headers={'Retry-After': '1'}),
        Mock(status_code=429, headers={'Retry-After': '2'}),
        Mock(status_code=200, json=lambda: {"data": "success"})
    ]
    
    with patch('requests.get', side_effect=mock_responses):
        result = api_call_with_retry("https://api.example.com/data")
        assert result == {"data": "success"}

Or use chaos engineering tools like Toxiproxy to inject 429 responses.

Monitoring API Rate Limits with API Status Check

Track rate limit health across all your API integrations in one dashboard:

✅ Real-time rate limit remaining
✅ Alert before hitting limits
✅ 429 error rate trends
✅ Multi-provider fallback testing
✅ Historical rate limit patterns

Start monitoring →

Need help monitoring your APIs? API Status Check tracks uptime, latency, and rate limit health for 400+ popular APIs. Get alerts before your users notice issues.