Where can I monitor API status in real-time?

API Status Check (apistatuscheck.com) provides real-time monitoring for 100+ APIs with uptime tracking and alerts. You can view dashboards, subscribe to feeds, and set up notifications in minutes.

How to Handle API Rate Limiting: A Complete Guide to 429 Errors

Q: How do I Handle API Rate Limiting: A Complete Guide to 429 Errors?

This post explains How to Handle API Rate Limiting: A Complete Guide to 429 Errors with clear steps and practical examples. Use the guidance to apply the recommendations in your own API workflows.

Quick Answer: API rate limiting is a traffic control mechanism that restricts the number of requests a client can make within a time window. When you exceed these limits, you receive a 429 "Too Many Requests" error. Handle rate limits by implementing exponential backoff, respecting Retry-After and X-RateLimit-* headers, caching responses, and queuing requests. Most APIs enforce limits like 100-10,000 requests per hour depending on your subscription tier.

If you've ever built an application that integrates with third-party APIs, you've likely encountered the dreaded 429 Too Many Requests error. Understanding how to properly handle API rate limiting isn't just about avoiding errors—it's about building resilient, production-ready applications that scale gracefully under load while respecting the infrastructure constraints of the services you depend on.

What is API Rate Limiting and Why Do APIs Use It?

API rate limiting is a technique used by API providers to control the amount of incoming traffic to their servers by restricting the number of requests a client can make within a specific time period. Think of it as a bouncer at a club—only so many people can enter per hour to ensure everyone inside has a good experience.

Why API Providers Implement Rate Limits

1. Infrastructure Protection

APIs handle millions of requests daily. Without rate limits, a single misconfigured client (or malicious actor) could overwhelm the entire system with requests, degrading service for all users. Rate limits ensure fair resource distribution across all clients.

2. Cost Management

Every API request consumes compute resources, database connections, bandwidth, and potentially third-party service credits. Rate limiting helps providers manage infrastructure costs and prevent abuse that could lead to unexpected expenses.

3. Service Quality Assurance

By controlling request velocity, APIs can maintain consistent response times and availability. This prevents cascade failures where overload on one component brings down the entire system.

4. Business Model Enforcement

Many APIs use tiered pricing where higher-paying customers receive higher rate limits. This creates a fair monetization model where heavy users contribute more to infrastructure costs.

5. Security and Abuse Prevention

Rate limits thwart brute force attacks, credential stuffing, data scraping, and DDoS attempts. They make it economically infeasible for bad actors to abuse the service at scale.

Real-World Impact

When Stripe processes payments, OpenAI generates text, or Twilio sends SMS messages, each request costs real money in infrastructure. A single bug in your retry logic could:

Send 10,000 duplicate API calls in minutes
Rack up thousands in unexpected charges
Get your API key suspended
Impact other customers' service quality

Rate limiting protects both the provider and the ecosystem of developers building on their platform.

Understanding 429 Too Many Requests

The HTTP 429 status code is the universal signal that you've exceeded an API's rate limit. Unlike 5xx server errors (which indicate problems on the provider's side), a 429 explicitly tells you that your client is sending too many requests.

Anatomy of a 429 Response

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1675890123
Retry-After: 60

{
  "error": {
    "message": "Rate limit exceeded. Retry after 60 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Key components:

Status Code 429: The HTTP response code indicating rate limit exceeded
X-RateLimit-Limit: Your maximum requests allowed in the current window
X-RateLimit-Remaining: How many requests you have left (0 when rate limited)
X-RateLimit-Reset: Unix timestamp when your limit resets
Retry-After: Seconds to wait before retrying (or an HTTP date)

Common Causes of 429 Errors

1. Burst Traffic Spikes

Your application suddenly receives a surge of user activity (product launch, viral post, peak shopping hours) that triggers proportionally more API calls.

2. Inefficient API Usage

Making individual API calls in loops instead of using batch endpoints, or fetching data you already have cached.

3. Parallel Request Floods

Running multiple application instances without coordinated rate limiting, or implementing aggressive parallelization without throttling.

4. Retry Storm

A bug in error handling causes failed requests to retry immediately and repeatedly, creating a feedback loop that makes the problem worse.

5. Development/Testing Mistakes

Running load tests against production APIs, infinite loops in development, or automated scripts without rate limiting logic.

6. Plan Limits

You've legitimately outgrown your current API plan's rate limit based on organic growth.

Common Rate Limit Patterns

API providers implement rate limiting using various algorithms, each with different characteristics. Understanding these patterns helps you design better integration strategies.

Per-Second/Minute/Hour Limits

The simplest approach: a fixed number of requests allowed per time window.

Examples:

Twitter API: 300 requests per 15-minute window
GitHub API: 5,000 requests per hour (authenticated)
Stripe API: 100 requests per second per account

How it works:

Time Window: 60 seconds
Limit: 100 requests

Request 1-100: ✓ Allowed
Request 101+: ✗ 429 until window resets

Pros: Simple to understand and implement Cons: Allows burst traffic that could still overwhelm systems

Token Bucket Algorithm

Imagine a bucket that holds tokens. Each API request consumes one token. Tokens refill at a steady rate. When the bucket is empty, requests are denied until tokens regenerate.

Example configuration:

Bucket capacity: 100 tokens
Refill rate: 10 tokens per second
Each request costs: 1 token

Behavior:

// Bucket starts with 100 tokens
bucket.tokens = 100;
bucket.refillRate = 10; // per second

// Burst: 100 requests instantly succeeds (drains bucket)
for (let i = 0; i < 100; i++) {
  await api.call(); // ✓ All succeed
}

// Request 101 immediately: ✗ 429 (bucket empty)

// After 1 second: 10 tokens regenerated
// Requests 101-110: ✓ Succeed

// After 10 seconds: Bucket fully refilled (100 tokens)

Pros: Allows controlled bursts while preventing sustained overload Cons: More complex to implement and reason about

Popular with: AWS API Gateway, Stripe, GitHub

Sliding Window Algorithm

Instead of fixed time buckets, the rate limit is calculated based on the past N seconds/minutes from the current moment.

Fixed window problem:

Fixed Window (60 seconds):
Time 0-59s: 1000 requests ✓
Time 60-119s: 1000 requests ✓

Problem: 1000 requests at 11:00:59 + 1000 at 11:01:00 
= 2000 requests in 1 second!

Sliding window solution:

Limit: 1000 requests per 60 seconds

At time 11:01:30, check: How many requests in past 60 seconds?
- Counts requests from 11:00:30 to 11:01:30
- More accurate traffic control

Pros: Prevents burst exploits at window boundaries Cons: Requires storing timestamps for each request

Popular with: Cloudflare, Redis rate limiting, modern APIs

Tiered Limits by Plan

Most commercial APIs implement different rate limits based on subscription tiers, creating a natural upgrade path as usage grows.

Typical tier structure:

Plan	Rate Limit	Price
Free	100 requests/hour	$0
Starter	1,000 requests/hour	$29/month
Professional	10,000 requests/hour	$99/month
Enterprise	100,000+ requests/hour	Custom

OpenAI Example:

Free tier: 3 requests per minute (RPM)
Pay-as-you-go: 3,500 RPM for GPT-4
Tier 5: 10,000 RPM after $1,000+ monthly spend

Resource-based limits: Some APIs rate-limit by resource consumption rather than request count:

Anthropic Claude API:
- Rate limit: 50,000 tokens per minute
- Small request (100 tokens): Uses 0.2% of limit
- Large request (5,000 tokens): Uses 10% of limit

This is fairer for APIs with variable request sizes.

Concurrent Request Limits

Some APIs also limit concurrent (simultaneous) requests, not just total requests per time window.

Example (Twilio):

1,000 requests per second (rate limit)
Maximum 100 concurrent requests per account

Why this matters:

// This could hit concurrent limit even within rate limit:
const promises = [];
for (let i = 0; i < 500; i++) {
  promises.push(api.call()); // 500 simultaneous requests
}
await Promise.all(promises); // ✗ May fail with 429

Solution: Use a concurrency limiter:

const pLimit = require('p-limit');
const limit = pLimit(50); // Max 50 concurrent

const promises = items.map(item => 
  limit(() => api.call(item)) // Queues excess requests
);
await Promise.all(promises); // ✓ Respects concurrent limit

How to Detect You're Being Rate Limited

Before implementing rate limit handling, you need reliable detection. Here's how to identify rate limiting in your API integration.

Standard Rate Limit Headers

Most modern APIs include standardized headers in every response to help you track your rate limit status:

const response = await fetch('https://api.github.com/user', {
  headers: { 'Authorization': `token ${GITHUB_TOKEN}` }
});

console.log(response.headers.get('X-RateLimit-Limit'));     // "5000"
console.log(response.headers.get('X-RateLimit-Remaining')); // "4999"
console.log(response.headers.get('X-RateLimit-Reset'));     // "1675890123"

Header Variations by Provider

Unfortunately, header names aren't fully standardized:

Provider	Limit Header	Remaining Header	Reset Header
GitHub	`X-RateLimit-Limit`	`X-RateLimit-Remaining`	`X-RateLimit-Reset`
Twitter	`x-rate-limit-limit`	`x-rate-limit-remaining`	`x-rate-limit-reset`
Stripe	`X-Stripe-Limit`	`X-Stripe-Remaining`	`X-Stripe-Reset`
OpenAI	`x-ratelimit-limit-requests`	`x-ratelimit-remaining-requests`	`x-ratelimit-reset-requests`

Parsing example:

function parseRateLimitHeaders(headers) {
  // Try common header variations
  const limit = parseInt(
    headers.get('X-RateLimit-Limit') || 
    headers.get('x-rate-limit-limit') ||
    headers.get('X-Stripe-Limit') || '0'
  );
  
  const remaining = parseInt(
    headers.get('X-RateLimit-Remaining') || 
    headers.get('x-rate-limit-remaining') ||
    headers.get('X-Stripe-Remaining') || '0'
  );
  
  const reset = parseInt(
    headers.get('X-RateLimit-Reset') || 
    headers.get('x-rate-limit-reset') ||
    headers.get('X-Stripe-Reset') || '0'
  );
  
  return { limit, remaining, reset };
}

The Retry-After Header

When you receive a 429 response, the Retry-After header tells you exactly how long to wait:

async function handleRateLimit(response) {
  if (response.status === 429) {
    const retryAfter = response.headers.get('Retry-After');
    
    if (retryAfter) {
      // Can be seconds (number) or HTTP date (string)
      const waitSeconds = parseInt(retryAfter) || 
        (Date.parse(retryAfter) - Date.now()) / 1000;
      
      console.log(`Rate limited. Waiting ${waitSeconds} seconds...`);
      await new Promise(r => setTimeout(r, waitSeconds * 1000));
      
      // Retry the request
      return fetch(response.url, response.options);
    }
  }
  return response;
}

Proactive Monitoring

Don't wait for 429 errors. Monitor your rate limit consumption proactively:

class RateLimitTracker {
  constructor() {
    this.limits = {};
  }
  
  track(apiName, headers) {
    const { limit, remaining, reset } = parseRateLimitHeaders(headers);
    
    this.limits[apiName] = { limit, remaining, reset };
    
    // Alert if getting close to limit
    const percentUsed = ((limit - remaining) / limit) * 100;
    
    if (percentUsed > 80) {
      console.warn(
        `[${apiName}] Rate limit warning: ${percentUsed.toFixed(1)}% used ` +
        `(${remaining}/${limit} remaining)`
      );
    }
    
    if (percentUsed > 95) {
      console.error(
        `[${apiName}] CRITICAL: Rate limit nearly exhausted! ` +
        `${remaining} requests remaining until ${new Date(reset * 1000)}`
      );
    }
  }
  
  getStatus(apiName) {
    return this.limits[apiName] || null;
  }
}

const tracker = new RateLimitTracker();

// Use with every API call
const response = await fetch('https://api.example.com/data');
tracker.track('example-api', response.headers);

Detecting Rate Limits Without Headers

Some older APIs don't provide rate limit headers. In these cases, detect 429s by status code:

async function makeRequest(url) {
  const response = await fetch(url);
  
  // Check for rate limiting
  if (response.status === 429) {
    // Fallback: exponential backoff without Retry-After
    const waitTime = Math.pow(2, retryCount) * 1000;
    console.log(`Rate limited. Backing off ${waitTime}ms`);
    await new Promise(r => setTimeout(r, waitTime));
    return makeRequest(url); // Retry
  }
  
  return response;
}

Rate Limit Handling Strategies with Code Examples

Now that you can detect rate limiting, let's implement robust handling strategies. These patterns work across languages and API providers.

1. Exponential Backoff

The gold standard for retry logic. When a request fails, wait progressively longer between retries.

JavaScript/Node.js Implementation:

async function exponentialBackoff(
  fn, 
  maxRetries = 5, 
  baseDelay = 1000,
  maxDelay = 32000
) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      // Only retry on rate limits or transient errors
      if (
        error.status !== 429 && 
        error.status !== 503 && 
        error.status !== 502
      ) {
        throw error; // Don't retry 4xx client errors
      }
      
      if (attempt === maxRetries - 1) {
        throw error; // Exhausted retries
      }
      
      // Calculate delay: baseDelay * 2^attempt + jitter
      const exponentialDelay = Math.min(
        baseDelay * Math.pow(2, attempt),
        maxDelay
      );
      
      // Add jitter to prevent thundering herd
      const jitter = Math.random() * 1000;
      const totalDelay = exponentialDelay + jitter;
      
      console.log(
        `Attempt ${attempt + 1} failed. ` +
        `Retrying in ${(totalDelay / 1000).toFixed(2)}s...`
      );
      
      await new Promise(resolve => setTimeout(resolve, totalDelay));
    }
  }
}

// Usage
const data = await exponentialBackoff(async () => {
  const response = await fetch('https://api.stripe.com/v1/charges', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${STRIPE_SECRET_KEY}`,
      'Content-Type': 'application/x-www-form-urlencoded'
    },
    body: 'amount=2000&currency=usd&source=tok_visa'
  });
  
  if (!response.ok) {
    const error = new Error('API request failed');
    error.status = response.status;
    throw error;
  }
  
  return response.json();
});

Python Implementation:

import time
import random
from typing import Callable, Any

def exponential_backoff(
    fn: Callable[[], Any],
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 32.0
) -> Any:
    """Execute function with exponential backoff retry logic."""
    
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as error:
            # Only retry on rate limits or transient errors
            status = getattr(error, 'status', None)
            
            if status not in [429, 502, 503]:
                raise error  # Don't retry client errors
            
            if attempt == max_retries - 1:
                raise error  # Exhausted retries
            
            # Calculate delay: base_delay * 2^attempt + jitter
            exponential_delay = min(
                base_delay * (2 ** attempt),
                max_delay
            )
            
            # Add jitter to prevent thundering herd
            jitter = random.uniform(0, 1)
            total_delay = exponential_delay + jitter
            
            print(
                f"Attempt {attempt + 1} failed. "
                f"Retrying in {total_delay:.2f}s..."
            )
            
            time.sleep(total_delay)

# Usage
import stripe
stripe.api_key = "sk_test_..."

data = exponential_backoff(
    lambda: stripe.Charge.create(
        amount=2000,
        currency="usd",
        source="tok_visa"
    )
)

Why jitter matters: Without jitter, all clients hitting rate limits retry at exactly the same time, creating synchronized thundering herd problems that make recovery harder.

2. Request Queuing

Instead of firing requests immediately, queue them and process at a controlled rate.

JavaScript Implementation with bottleneck:

const Bottleneck = require('bottleneck');

// Configure limiter: max 100 requests per minute
const limiter = new Bottleneck({
  maxConcurrent: 10,      // Max 10 simultaneous requests
  minTime: 600,           // Minimum 600ms between requests (= 100/min)
  reservoir: 100,         // Start with 100 requests available
  reservoirRefreshAmount: 100,  // Add 100 requests
  reservoirRefreshInterval: 60 * 1000  // Every 60 seconds
});

// Handle 429 responses
limiter.on('failed', async (error, jobInfo) => {
  if (error.status === 429) {
    const retryAfter = error.retryAfter || 60;
    console.log(`Rate limited. Retrying after ${retryAfter}s...`);
    return retryAfter * 1000; // Tell bottleneck when to retry
  }
});

// Wrap your API calls
const fetchUser = limiter.wrap(async (userId) => {
  const response = await fetch(`https://api.example.com/users/${userId}`);
  
  if (response.status === 429) {
    const error = new Error('Rate limited');
    error.status = 429;
    error.retryAfter = parseInt(response.headers.get('Retry-After')) || 60;
    throw error;
  }
  
  return response.json();
});

// Use like normal async function
const user = await fetchUser('user_123');

// Process many items - automatically queued and rate-limited
const users = await Promise.all(
  userIds.map(id => fetchUser(id))
);

Python Implementation with ratelimit:

from ratelimit import limits, sleep_and_retry
import requests

# Max 100 calls per minute
@sleep_and_retry
@limits(calls=100, period=60)
def fetch_user(user_id: str) -> dict:
    """Fetch user data with automatic rate limiting."""
    
    response = requests.get(
        f"https://api.example.com/users/{user_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    
    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        print(f"Rate limited. Sleeping {retry_after}s...")
        time.sleep(retry_after)
        return fetch_user(user_id)  # Retry
    
    response.raise_for_status()
    return response.json()

# Usage - automatically rate-limited
users = [fetch_user(user_id) for user_id in user_ids]

3. Caching Responses

The best API call is one you don't make. Aggressive caching reduces rate limit pressure.

JavaScript Implementation:

const NodeCache = require('node-cache');

class CachedAPIClient {
  constructor(ttl = 300) { // Default 5-minute TTL
    this.cache = new NodeCache({ 
      stdTTL: ttl,
      checkperiod: 60 
    });
  }
  
  async get(endpoint, options = {}) {
    const cacheKey = this._getCacheKey(endpoint, options);
    
    // Check cache first
    const cached = this.cache.get(cacheKey);
    if (cached) {
      console.log(`[CACHE HIT] ${endpoint}`);
      return cached;
    }
    
    console.log(`[CACHE MISS] ${endpoint}`);
    
    // Fetch from API
    const response = await fetch(endpoint, options);
    
    if (!response.ok) {
      throw new Error(`API error: ${response.status}`);
    }
    
    const data = await response.json();
    
    // Store in cache
    this.cache.set(cacheKey, data);
    
    return data;
  }
  
  _getCacheKey(endpoint, options) {
    return `${endpoint}:${JSON.stringify(options)}`;
  }
  
  invalidate(endpoint) {
    const keys = this.cache.keys();
    keys.forEach(key => {
      if (key.startsWith(endpoint)) {
        this.cache.del(key);
      }
    });
  }
}

// Usage
const api = new CachedAPIClient(300); // 5-minute cache

// First call hits API
const user1 = await api.get('https://api.github.com/users/octocat');

// Second call within 5 minutes uses cache
const user2 = await api.get('https://api.github.com/users/octocat');

// Invalidate when data changes
api.invalidate('https://api.github.com/users/octocat');

Python with functools.lru_cache:

from functools import lru_cache
import requests
import hashlib
import json

class CachedAPIClient:
    def __init__(self, ttl: int = 300):
        self.ttl = ttl
        self._cache = {}
    
    def get(self, endpoint: str, params: dict = None) -> dict:
        """Fetch data with caching."""
        cache_key = self._get_cache_key(endpoint, params)
        
        # Check cache
        if cache_key in self._cache:
            cached_data, timestamp = self._cache[cache_key]
            if time.time() - timestamp < self.ttl:
                print(f"[CACHE HIT] {endpoint}")
                return cached_data
        
        print(f"[CACHE MISS] {endpoint}")
        
        # Fetch from API
        response = requests.get(endpoint, params=params)
        response.raise_for_status()
        data = response.json()
        
        # Store in cache
        self._cache[cache_key] = (data, time.time())
        
        return data
    
    def _get_cache_key(self, endpoint: str, params: dict) -> str:
        key_str = f"{endpoint}:{json.dumps(params, sort_keys=True)}"
        return hashlib.md5(key_str.encode()).hexdigest()

# Usage
api = CachedAPIClient(ttl=300)

# First call hits API
user1 = api.get('https://api.github.com/users/octocat')

# Second call uses cache
user2 = api.get('https://api.github.com/users/octocat')

Cache invalidation strategies:

Time-based (TTL): Expire after N seconds/minutes
Event-based: Invalidate when source data changes
LRU (Least Recently Used): Evict oldest entries when cache is full
Conditional requests: Use ETag/If-None-Match headers to revalidate

4. Circuit Breaker Pattern

When an API is consistently failing, stop making requests temporarily to avoid wasting rate limit quota and creating cascading failures.

JavaScript Implementation:

class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000; // 1 minute
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.failureCount = 0;
    this.nextAttempt = Date.now();
  }
  
  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error(
          'Circuit breaker is OPEN. ' +
          `Retry after ${new Date(this.nextAttempt)}`
        );
      }
      // Transition to HALF_OPEN to test if service recovered
      this.state = 'HALF_OPEN';
      console.log('Circuit breaker transitioning to HALF_OPEN');
    }
    
    try {
      const result = await fn();
      this._onSuccess();
      return result;
    } catch (error) {
      this._onFailure();
      throw error;
    }
  }
  
  _onSuccess() {
    this.failureCount = 0;
    if (this.state === 'HALF_OPEN') {
      console.log('Circuit breaker closing - service recovered');
      this.state = 'CLOSED';
    }
  }
  
  _onFailure() {
    this.failureCount++;
    
    if (this.failureCount >= this.failureThreshold) {
      console.error(
        `Circuit breaker opening after ${this.failureCount} failures`
      );
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.resetTimeout;
    }
  }
  
  getState() {
    return {
      state: this.state,
      failureCount: this.failureCount,
      nextAttempt: this.nextAttempt
    };
  }
}

// Usage
const breaker = new CircuitBreaker({
  failureThreshold: 5,    // Open after 5 failures
  resetTimeout: 60000     // Try again after 1 minute
});

async function fetchData() {
  return breaker.execute(async () => {
    const response = await fetch('https://api.example.com/data');
    
    if (response.status === 429 || response.status >= 500) {
      throw new Error(`API error: ${response.status}`);
    }
    
    return response.json();
  });
}

// Automatically stops calling failing API
try {
  const data = await fetchData();
} catch (error) {
  console.log('Request failed:', error.message);
  console.log('Circuit breaker state:', breaker.getState());
}

Python Implementation:

import time
from enum import Enum
from typing import Callable, Any

class CircuitState(Enum):
    CLOSED = "CLOSED"
    OPEN = "OPEN"
    HALF_OPEN = "HALF_OPEN"

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        reset_timeout: int = 60
    ):
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.next_attempt = time.time()
    
    def execute(self, fn: Callable[[], Any]) -> Any:
        """Execute function with circuit breaker protection."""
        
        if self.state == CircuitState.OPEN:
            if time.time() < self.next_attempt:
                raise Exception(
                    f"Circuit breaker is OPEN. "
                    f"Retry after {self.next_attempt - time.time():.0f}s"
                )
            # Transition to HALF_OPEN
            self.state = CircuitState.HALF_OPEN
            print("Circuit breaker transitioning to HALF_OPEN")
        
        try:
            result = fn()
            self._on_success()
            return result
        except Exception as error:
            self._on_failure()
            raise error
    
    def _on_success(self):
        self.failure_count = 0
        if self.state == CircuitState.HALF_OPEN:
            print("Circuit breaker closing - service recovered")
            self.state = CircuitState.CLOSED
    
    def _on_failure(self):
        self.failure_count += 1
        
        if self.failure_count >= self.failure_threshold:
            print(
                f"Circuit breaker opening after "
                f"{self.failure_count} failures"
            )
            self.state = CircuitState.OPEN
            self.next_attempt = time.time() + self.reset_timeout

# Usage
breaker = CircuitBreaker(failure_threshold=5, reset_timeout=60)

def fetch_data():
    return breaker.execute(lambda: requests.get('https://api.example.com/data'))

When to use circuit breakers:

API is experiencing an outage (returns 5xx errors)
Consistent rate limiting despite proper backoff
Network connectivity issues
Dependency failures that won't resolve immediately

Rate Limits by Popular APIs

Understanding specific rate limits for the APIs you use helps you design within constraints.

Stripe

Rate limits:

Default: 100 requests per second per account
Read operations: Higher limits (list endpoints)
Write operations: Lower limits (charge creation)

Best practices:

Use idempotency keys to safely retry
Implement webhooks instead of polling
Use expand parameter to fetch related objects in one request

Monitoring: Check Stripe API status

// Stripe-specific rate limit handling
const stripe = require('stripe')(process.env.STRIPE_SECRET_KEY);

async function createChargeWithRetry(amount, source) {
  return exponentialBackoff(async () => {
    try {
      return await stripe.charges.create({
        amount,
        currency: 'usd',
        source
      }, {
        idempotencyKey: `charge_${Date.now()}_${Math.random()}`
      });
    } catch (error) {
      if (error.type === 'StripeRateLimitError') {
        const rateLimitError = new Error('Rate limited');
        rateLimitError.status = 429;
        throw rateLimitError;
      }
      throw error;
    }
  });
}

OpenAI

Rate limits (GPT-4):

Free tier: 3 RPM (requests per minute)
Tier 1: 500 RPM, 30,000 TPM (tokens per minute)
Tier 5: 10,000 RPM, 300,000 TPM

Headers:

x-ratelimit-limit-requests
x-ratelimit-remaining-requests
x-ratelimit-limit-tokens
x-ratelimit-remaining-tokens

Best practices:

Track token consumption, not just request count
Use smaller models (GPT-3.5) for less critical tasks
Implement request batching where possible

Monitoring: Check OpenAI API status

import openai
import time

def chat_with_retry(messages, max_retries=3):
    """OpenAI chat completion with rate limit handling."""
    
    for attempt in range(max_retries):
        try:
            response = openai.ChatCompletion.create(
                model="gpt-4",
                messages=messages
            )
            return response
            
        except openai.error.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            
            # Extract wait time from error message if available
            wait_time = 20 * (attempt + 1)  # Progressive backoff
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

GitHub

Rate limits:

Authenticated: 5,000 requests per hour
Unauthenticated: 60 requests per hour
Search API: 30 requests per minute
GraphQL API: 5,000 points per hour

Headers:

X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset
X-RateLimit-Used

Best practices:

Always authenticate to get 5,000 requests/hour
Use GraphQL to fetch exactly the data you need
Use conditional requests with ETag headers

Monitoring: Check GitHub API status

// GitHub API rate limit checking
async function checkGitHubRateLimit() {
  const response = await fetch('https://api.github.com/rate_limit', {
    headers: {
      'Authorization': `token ${GITHUB_TOKEN}`
    }
  });
  
  const data = await response.json();
  
  console.log('Core API:', data.resources.core);
  console.log('Search API:', data.resources.search);
  console.log('GraphQL API:', data.resources.graphql);
  
  return data;
}

Twilio

Rate limits:

Default: 1,000 requests per second (burst)
Sustained: Varies by account age/usage
Concurrent requests: 100 per account

Rate limit codes:

20429 - Too Many Requests

Best practices:

Implement queuing for SMS campaigns
Use message batching via Messaging Services
Monitor X-Shenanigans-Detected header

from twilio.rest import Client
import time

client = Client(account_sid, auth_token)

def send_sms_with_rate_limit(to, body):
    """Send SMS with rate limit handling."""
    max_retries = 3
    
    for attempt in range(max_retries):
        try:
            message = client.messages.create(
                to=to,
                from_=twilio_number,
                body=body
            )
            return message.sid
            
        except TwilioRestException as e:
            if e.code == 20429:  # Rate limited
                wait_time = 5 * (attempt + 1)
                print(f"Rate limited. Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                raise

Twitter (X) API

Rate limits (v2):

Tweet lookup: 300 requests per 15-minute window
User lookup: 300 requests per 15-minute window
Search tweets: 450 requests per 15-minute window

Headers:

x-rate-limit-limit
x-rate-limit-remaining
x-rate-limit-reset

Best practices:

Use tweet IDs instead of searching repeatedly
Cache user data aggressively
Implement 15-minute window tracking

Building Rate-Limit-Aware Applications

Rate limit handling should be built into your application architecture from day one, not bolted on after hitting limits.

Architecture Patterns

1. Centralized Rate Limiter Service

For distributed applications, implement a shared rate limiter using Redis:

const Redis = require('ioredis');
const redis = new Redis();

class DistributedRateLimiter {
  async checkLimit(apiName, limit, windowSeconds) {
    const key = `ratelimit:${apiName}`;
    const now = Date.now();
    const windowStart = now - (windowSeconds * 1000);
    
    // Remove old entries
    await redis.zremrangebyscore(key, 0, windowStart);
    
    // Count requests in current window
    const count = await redis.zcard(key);
    
    if (count >= limit) {
      const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES');
      const resetTime = parseInt(oldestEntry[1]) + (windowSeconds * 1000);
      
      return {
        allowed: false,
        resetAt: resetTime
      };
    }
    
    // Add current request
    await redis.zadd(key, now, `${now}:${Math.random()}`);
    await redis.expire(key, windowSeconds);
    
    return {
      allowed: true,
      remaining: limit - count - 1
    };
  }
}

// Usage across multiple servers
const limiter = new DistributedRateLimiter();

app.post('/api/action', async (req, res) => {
  const result = await limiter.checkLimit('openai-api', 100, 60);
  
  if (!result.allowed) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      resetAt: result.resetAt
    });
  }
  
  // Process request
  const data = await callOpenAI(req.body);
  res.json(data);
});

2. Rate Limit Middleware

Wrap API clients with middleware that automatically handles rate limiting:

class RateLimitedAPIClient {
  constructor(apiClient, rateLimit) {
    this.client = apiClient;
    this.limiter = new Bottleneck({
      maxConcurrent: rateLimit.concurrent || 10,
      minTime: (60 * 1000) / rateLimit.requestsPerMinute
    });
  }
  
  async request(method, endpoint, data) {
    return this.limiter.schedule(async () => {
      return exponentialBackoff(async () => {
        const response = await this.client.request(method, endpoint, data);
        
        if (response.status === 429) {
          const error = new Error('Rate limited');
          error.status = 429;
          error.retryAfter = parseInt(
            response.headers.get('Retry-After') || 60
          );
          throw error;
        }
        
        return response;
      });
    });
  }
}

// Usage
const openai = new RateLimitedAPIClient(openaiClient, {
  requestsPerMinute: 50,
  concurrent: 5
});

// All requests automatically rate-limited
const completion = await openai.request('POST', '/v1/chat/completions', {
  model: 'gpt-4',
  messages: [{role: 'user', content: 'Hello!'}]
});

3. Background Job Queues

For non-time-sensitive operations, use job queues to smooth traffic:

const Bull = require('bull');

// Create queue
const apiQueue = new Bull('api-requests', {
  redis: { host: 'localhost', port: 6379 }
});

// Process queue with rate limiting
apiQueue.process(5, async (job) => { // Max 5 concurrent
  const { endpoint, data } = job.data;
  
  return await exponentialBackoff(async () => {
    return await fetch(endpoint, {
      method: 'POST',
      body: JSON.stringify(data)
    });
  });
});

// Limit processing rate
apiQueue.on('completed', (job) => {
  // Wait 1 second between jobs (3,600 requests/hour)
  return new Promise(r => setTimeout(r, 1000));
});

// Add jobs to queue
await apiQueue.add({
  endpoint: 'https://api.example.com/data',
  data: { message: 'Hello' }
});

User-Facing Rate Limit Communication

When your application hits rate limits, communicate clearly with users:

Good error message:

{
  "error": "rate_limit_exceeded",
  "message": "You've made too many requests. Please try again in 2 minutes.",
  "retry_after": 120,
  "limit": 100,
  "period": "hour",
  "docs_url": "https://docs.example.com/rate-limits"
}

Bad error message:

{
  "error": "Too many requests"
}

UI considerations:

Show progress bars for bulk operations
Display "X requests remaining this hour"
Gracefully queue actions when near limits
Offer upgrade prompts when consistently hitting free tier limits

Monitoring and Alerting

Set up comprehensive monitoring:

// Track rate limit metrics
class RateLimitMetrics {
  constructor(metricsClient) {
    this.metrics = metricsClient;
  }
  
  recordAPICall(apiName, headers) {
    const { limit, remaining } = parseRateLimitHeaders(headers);
    const percentUsed = ((limit - remaining) / limit) * 100;
    
    // Send to monitoring service (DataDog, CloudWatch, etc.)
    this.metrics.gauge(`api.rate_limit.${apiName}.remaining`, remaining);
    this.metrics.gauge(`api.rate_limit.${apiName}.percent_used`, percentUsed);
    
    // Alert if consistently above 80%
    if (percentUsed > 80) {
      this.metrics.event({
        title: `High rate limit usage: ${apiName}`,
        text: `${percentUsed.toFixed(1)}% of rate limit used`,
        alert_type: 'warning',
        tags: [`api:${apiName}`]
      });
    }
  }
  
  recordRateLimitError(apiName, retryAfter) {
    this.metrics.increment(`api.rate_limit.${apiName}.errors`);
    this.metrics.gauge(`api.rate_limit.${apiName}.retry_after`, retryAfter);
  }
}

When Rate Limits Indicate an Outage

Sometimes what appears to be rate limiting is actually an API outage in disguise.

Distinguishing Rate Limits from Outages

Normal rate limiting:

✓ Predictable based on your usage patterns
✓ Affects only your account
✓ Resolves after waiting the specified time
✓ Headers present and accurate
✓ Error messages are clear

Possible outage:

✗ Sudden 429s when well within normal limits
✗ No Retry-After header or incorrect value
✗ Affects all endpoints simultaneously
✗ Accompanied by 5xx errors
✗ Multiple users reporting issues on social media

Cross-Reference with Status Monitoring

Before assuming you're being rate-limited unfairly, check if the API is experiencing problems:

Quick status check workflow:

Check API Status Check - Real-time monitoring for 100+ APIs
- Visit apistatuscheck.com/api/stripe
- Or apistatuscheck.com/api/openai
- Or apistatuscheck.com/api/github
Check official status pages:
- Stripe: status.stripe.com
- OpenAI: status.openai.com
- GitHub: githubstatus.com
Search social media:
- Twitter/X: Search "Stripe down" or "@stripeapi"
- Hacker News, Reddit r/webdev
Test from different locations:
- Regional outages may only affect certain data centers
- Use a VPN or cloud function in another region

Automated Outage Detection

Build outage detection into your monitoring:

async function detectPossibleOutage(apiName, errorRate) {
  // If error rate suddenly spikes above normal
  if (errorRate > 0.5) { // 50% of requests failing
    
    // Check API Status Check
    const status = await fetch(
      `https://apistatuscheck.com/api/${apiName}/status`
    ).then(r => r.json());
    
    if (status.operational === false) {
      // Confirmed outage
      await notifyTeam({
        title: `${apiName} Outage Detected`,
        message: `${apiName} is experiencing issues. See: https://apistatuscheck.com/api/${apiName}`,
        severity: 'critical'
      });
      
      // Switch to degraded mode
      await enableFallbackMode(apiName);
      
      return true;
    }
  }
  
  return false;
}

Benefits of status monitoring:

Faster incident response - Know within seconds
Better customer communication - "We're aware of issues with [Provider]"
Avoid wasted debugging time - Don't troubleshoot when provider is down
Historical data - Track provider reliability over time

Set up alerts for your critical APIs →

Frequently Asked Questions

What's the difference between rate limiting and throttling?

Rate limiting sets a hard cap on requests per time period—exceed it and you get a 429 error. Throttling slows down requests gracefully by adding artificial delays or queuing, keeping you under the limit. Rate limiting is reactive (API rejects excess), throttling is proactive (client self-regulates).

Should I implement client-side rate limiting even if the API has limits?

Yes! Client-side rate limiting ("throttling") is best practice because:

You avoid hitting limits and disrupting service
No wasted API calls that return 429s
More predictable application behavior
Better resource utilization (no retry storms)
Can coordinate limits across multiple services/instances

Think of API rate limits as a guardrail, not a target to hit.

How do I handle rate limits in serverless functions?

Serverless adds complexity because instances don't share state. Solutions:

Use external rate limiting (Redis, DynamoDB) to track limits across invocations
Pre-allocate request quotas - Each lambda gets 1/N of hourly limit
Implement queuing - SQS/SNS to serialize requests
Use AWS API Gateway throttling - Built-in per-key limits

Avoid storing rate limit state in-memory since lambdas are ephemeral.

Can I negotiate higher rate limits with API providers?

Yes! Options include:

Upgrade to paid tier - Usually instant limit increase
Contact enterprise sales - Custom limits for high-volume users
Show business case - Explain why you need higher limits
Demonstrate good citizenship - Efficient API usage, proper error handling

Providers want to support legitimate high-volume users—don't hesitate to ask.

What's a good rate limit for my own API?

Start with conservative defaults and adjust based on monitoring:

Public free tier: 100-1,000 requests/hour
Authenticated users: 5,000-10,000 requests/hour
Paid tiers: 10,000-100,000+ requests/hour

Consider:

Your infrastructure capacity (don't promise what you can't deliver)
Cost per request (databases, compute, third-party APIs)
Typical use cases (background sync needs more than interactive apps)
Competitive landscape (what do similar APIs offer?)

Implement gradually: Start restrictive, loosen as you scale.

How do I test rate limit handling in development?

Mock rate limits:

// Mock API client that simulates rate limiting
class MockAPIClient {
  constructor(requestsPerMinute) {
    this.limit = requestsPerMinute;
    this.requests = [];
  }
  
  async request(endpoint) {
    const now = Date.now();
    const oneMinuteAgo = now - 60000;
    
    // Remove old requests
    this.requests = this.requests.filter(t => t > oneMinuteAgo);
    
    if (this.requests.length >= this.limit) {
      const error = new Error('Rate limited');
      error.status = 429;
      throw error;
    }
    
    this.requests.push(now);
    return { data: 'success' };
  }
}

// Test your retry logic
const mockAPI = new MockAPIClient(10); // 10 requests/minute

// This should trigger rate limiting and retry
for (let i = 0; i < 20; i++) {
  await fetchWithRetry(() => mockAPI.request('/test'));
}

Use dedicated test environments with lower limits if providers offer them.

What happens if I ignore rate limits?

Consequences escalate:

429 errors - Requests rejected, application breaks
Longer rate limits - Punishment for abuse (hours instead of minutes)
API key suspension - Temporary or permanent ban
Account termination - Lose access entirely
IP blocking - Affects all applications from your infrastructure
Legal action - Violations of Terms of Service

Don't risk it. Respect rate limits.

Are rate limit headers standardized?

Unfortunately no. While many APIs follow the X-RateLimit-* pattern popularized by Twitter and GitHub, there's no official standard. Some providers use:

RateLimit-* (draft IETF standard)
X-Rate-Limit-*
X-RateLimit-*
Provider-specific headers (X-Shopify-Shop-Api-Call-Limit)

Always check the API documentation and write defensive parsing code that handles variations.

Can rate limits differ by region or time of day?

Yes, some APIs implement dynamic rate limiting:

Regional limits - EU data centers may have different caps
Time-based limits - Higher limits during off-peak hours
Burst allowances - Temporary increases for legitimate spikes
Adaptive limits - Machine learning adjusts based on system load

Check documentation and monitor actual limits via response headers rather than assuming fixed values.

How do I handle rate limits in webhooks?

Webhooks create unique challenges since you don't control the request rate:

Strategies:

Queue webhook events - Don't process synchronously
Acknowledge quickly - Return 200 immediately, process async
Batch processing - Group related events
Use webhook replay - If you miss events due to limits, replay from provider

Example:

app.post('/webhook', async (req, res) => {
  // Acknowledge immediately
  res.status(200).send('OK');
  
  // Queue for processing
  await webhookQueue.add({
    event: req.body,
    signature: req.headers['stripe-signature']
  });
});

// Process queue with rate limiting
webhookQueue.process(async (job) => {
  const { event } = job.data;
  
  // Make API calls with rate limit handling
  await processWebhookEvent(event);
});

Take Control of Your API Infrastructure

API rate limiting doesn't have to be a source of anxiety and downtime. With the strategies and code examples in this guide, you can build resilient applications that gracefully handle rate limits and deliver reliable experiences to your users.

Key takeaways:

✓ Monitor proactively - Track rate limit headers before hitting limits ✓ Implement exponential backoff - Retry intelligently, not aggressively
✓ Cache aggressively - The best API call is one you don't make ✓ Queue non-urgent requests - Smooth out traffic spikes ✓ Use circuit breakers - Fail fast when APIs are struggling ✓ Know when to upgrade - Don't outgrow your API tier ✓ Distinguish outages from rate limits - Check status monitoring

Monitor API Health in Real-Time

Don't wait for user complaints to discover API problems. API Status Check monitors 100+ APIs 24/7 and alerts you instantly when issues arise—including unexpected rate limiting that may indicate an outage.

Track the APIs you depend on:

Get alerted via:

📧 Email
💬 Slack
🔔 Discord
🪝 Custom webhooks

Start monitoring your APIs for free →

Last updated: February 4, 2026. Rate limit information is based on public API documentation and is subject to change. Always refer to official provider documentation for the most current limits.

How to Handle API Rate Limiting: A Complete Guide to 429 Errors

What is API Rate Limiting and Why Do APIs Use It?

Why API Providers Implement Rate Limits

Real-World Impact

Understanding 429 Too Many Requests

Anatomy of a 429 Response

Common Causes of 429 Errors

Common Rate Limit Patterns

Per-Second/Minute/Hour Limits

Token Bucket Algorithm

Sliding Window Algorithm

Tiered Limits by Plan

Concurrent Request Limits

How to Detect You're Being Rate Limited

Standard Rate Limit Headers

Header Variations by Provider

The Retry-After Header

Proactive Monitoring

Detecting Rate Limits Without Headers

Rate Limit Handling Strategies with Code Examples

1. Exponential Backoff

2. Request Queuing

3. Caching Responses

4. Circuit Breaker Pattern

Rate Limits by Popular APIs

Stripe

OpenAI

GitHub

Twilio

Twitter (X) API

Building Rate-Limit-Aware Applications

Architecture Patterns

User-Facing Rate Limit Communication

Monitoring and Alerting

When Rate Limits Indicate an Outage

Distinguishing Rate Limits from Outages

Cross-Reference with Status Monitoring

Automated Outage Detection

Frequently Asked Questions

What's the difference between rate limiting and throttling?

Should I implement client-side rate limiting even if the API has limits?

How do I handle rate limits in serverless functions?

Can I negotiate higher rate limits with API providers?

What's a good rate limit for my own API?

How do I test rate limit handling in development?

What happens if I ignore rate limits?

Are rate limit headers standardized?

Can rate limits differ by region or time of day?

How do I handle rate limits in webhooks?

Take Control of Your API Infrastructure

Monitor API Health in Real-Time

Monitor Your APIs