How to Handle API Rate Limiting: A Complete Guide to 429 Errors
How to Handle API Rate Limiting: A Complete Guide to 429 Errors
Quick Answer: API rate limiting is a traffic control mechanism that restricts the number of requests a client can make within a time window. When you exceed these limits, you receive a 429 "Too Many Requests" error. Handle rate limits by implementing exponential backoff, respecting Retry-After and X-RateLimit-* headers, caching responses, and queuing requests. Most APIs enforce limits like 100-10,000 requests per hour depending on your subscription tier.
If you've ever built an application that integrates with third-party APIs, you've likely encountered the dreaded 429 Too Many Requests error. Understanding how to properly handle API rate limiting isn't just about avoiding errors—it's about building resilient, production-ready applications that scale gracefully under load while respecting the infrastructure constraints of the services you depend on.
What is API Rate Limiting and Why Do APIs Use It?
API rate limiting is a technique used by API providers to control the amount of incoming traffic to their servers by restricting the number of requests a client can make within a specific time period. Think of it as a bouncer at a club—only so many people can enter per hour to ensure everyone inside has a good experience.
Why API Providers Implement Rate Limits
1. Infrastructure Protection
APIs handle millions of requests daily. Without rate limits, a single misconfigured client (or malicious actor) could overwhelm the entire system with requests, degrading service for all users. Rate limits ensure fair resource distribution across all clients.
2. Cost Management
Every API request consumes compute resources, database connections, bandwidth, and potentially third-party service credits. Rate limiting helps providers manage infrastructure costs and prevent abuse that could lead to unexpected expenses.
3. Service Quality Assurance
By controlling request velocity, APIs can maintain consistent response times and availability. This prevents cascade failures where overload on one component brings down the entire system.
4. Business Model Enforcement
Many APIs use tiered pricing where higher-paying customers receive higher rate limits. This creates a fair monetization model where heavy users contribute more to infrastructure costs.
5. Security and Abuse Prevention
Rate limits thwart brute force attacks, credential stuffing, data scraping, and DDoS attempts. They make it economically infeasible for bad actors to abuse the service at scale.
Real-World Impact
When Stripe processes payments, OpenAI generates text, or Twilio sends SMS messages, each request costs real money in infrastructure. A single bug in your retry logic could:
- Send 10,000 duplicate API calls in minutes
- Rack up thousands in unexpected charges
- Get your API key suspended
- Impact other customers' service quality
Rate limiting protects both the provider and the ecosystem of developers building on their platform.
Understanding 429 Too Many Requests
The HTTP 429 status code is the universal signal that you've exceeded an API's rate limit. Unlike 5xx server errors (which indicate problems on the provider's side), a 429 explicitly tells you that your client is sending too many requests.
Anatomy of a 429 Response
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1675890123
Retry-After: 60
{
"error": {
"message": "Rate limit exceeded. Retry after 60 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}
Key components:
- Status Code 429: The HTTP response code indicating rate limit exceeded
- X-RateLimit-Limit: Your maximum requests allowed in the current window
- X-RateLimit-Remaining: How many requests you have left (0 when rate limited)
- X-RateLimit-Reset: Unix timestamp when your limit resets
- Retry-After: Seconds to wait before retrying (or an HTTP date)
Common Causes of 429 Errors
1. Burst Traffic Spikes
Your application suddenly receives a surge of user activity (product launch, viral post, peak shopping hours) that triggers proportionally more API calls.
2. Inefficient API Usage
Making individual API calls in loops instead of using batch endpoints, or fetching data you already have cached.
3. Parallel Request Floods
Running multiple application instances without coordinated rate limiting, or implementing aggressive parallelization without throttling.
4. Retry Storm
A bug in error handling causes failed requests to retry immediately and repeatedly, creating a feedback loop that makes the problem worse.
5. Development/Testing Mistakes
Running load tests against production APIs, infinite loops in development, or automated scripts without rate limiting logic.
6. Plan Limits
You've legitimately outgrown your current API plan's rate limit based on organic growth.
Common Rate Limit Patterns
API providers implement rate limiting using various algorithms, each with different characteristics. Understanding these patterns helps you design better integration strategies.
Per-Second/Minute/Hour Limits
The simplest approach: a fixed number of requests allowed per time window.
Examples:
- Twitter API: 300 requests per 15-minute window
- GitHub API: 5,000 requests per hour (authenticated)
- Stripe API: 100 requests per second per account
How it works:
Time Window: 60 seconds
Limit: 100 requests
Request 1-100: ✓ Allowed
Request 101+: ✗ 429 until window resets
Pros: Simple to understand and implement Cons: Allows burst traffic that could still overwhelm systems
Token Bucket Algorithm
Imagine a bucket that holds tokens. Each API request consumes one token. Tokens refill at a steady rate. When the bucket is empty, requests are denied until tokens regenerate.
Example configuration:
- Bucket capacity: 100 tokens
- Refill rate: 10 tokens per second
- Each request costs: 1 token
Behavior:
// Bucket starts with 100 tokens
bucket.tokens = 100;
bucket.refillRate = 10; // per second
// Burst: 100 requests instantly succeeds (drains bucket)
for (let i = 0; i < 100; i++) {
await api.call(); // ✓ All succeed
}
// Request 101 immediately: ✗ 429 (bucket empty)
// After 1 second: 10 tokens regenerated
// Requests 101-110: ✓ Succeed
// After 10 seconds: Bucket fully refilled (100 tokens)
Pros: Allows controlled bursts while preventing sustained overload Cons: More complex to implement and reason about
Popular with: AWS API Gateway, Stripe, GitHub
Sliding Window Algorithm
Instead of fixed time buckets, the rate limit is calculated based on the past N seconds/minutes from the current moment.
Fixed window problem:
Fixed Window (60 seconds):
Time 0-59s: 1000 requests ✓
Time 60-119s: 1000 requests ✓
Problem: 1000 requests at 11:00:59 + 1000 at 11:01:00
= 2000 requests in 1 second!
Sliding window solution:
Limit: 1000 requests per 60 seconds
At time 11:01:30, check: How many requests in past 60 seconds?
- Counts requests from 11:00:30 to 11:01:30
- More accurate traffic control
Pros: Prevents burst exploits at window boundaries Cons: Requires storing timestamps for each request
Popular with: Cloudflare, Redis rate limiting, modern APIs
Tiered Limits by Plan
Most commercial APIs implement different rate limits based on subscription tiers, creating a natural upgrade path as usage grows.
Typical tier structure:
| Plan | Rate Limit | Price |
|---|---|---|
| Free | 100 requests/hour | $0 |
| Starter | 1,000 requests/hour | $29/month |
| Professional | 10,000 requests/hour | $99/month |
| Enterprise | 100,000+ requests/hour | Custom |
OpenAI Example:
- Free tier: 3 requests per minute (RPM)
- Pay-as-you-go: 3,500 RPM for GPT-4
- Tier 5: 10,000 RPM after $1,000+ monthly spend
Resource-based limits: Some APIs rate-limit by resource consumption rather than request count:
Anthropic Claude API:
- Rate limit: 50,000 tokens per minute
- Small request (100 tokens): Uses 0.2% of limit
- Large request (5,000 tokens): Uses 10% of limit
This is fairer for APIs with variable request sizes.
Concurrent Request Limits
Some APIs also limit concurrent (simultaneous) requests, not just total requests per time window.
Example (Twilio):
- 1,000 requests per second (rate limit)
- Maximum 100 concurrent requests per account
Why this matters:
// This could hit concurrent limit even within rate limit:
const promises = [];
for (let i = 0; i < 500; i++) {
promises.push(api.call()); // 500 simultaneous requests
}
await Promise.all(promises); // ✗ May fail with 429
Solution: Use a concurrency limiter:
const pLimit = require('p-limit');
const limit = pLimit(50); // Max 50 concurrent
const promises = items.map(item =>
limit(() => api.call(item)) // Queues excess requests
);
await Promise.all(promises); // ✓ Respects concurrent limit
How to Detect You're Being Rate Limited
Before implementing rate limit handling, you need reliable detection. Here's how to identify rate limiting in your API integration.
Standard Rate Limit Headers
Most modern APIs include standardized headers in every response to help you track your rate limit status:
const response = await fetch('https://api.github.com/user', {
headers: { 'Authorization': `token ${GITHUB_TOKEN}` }
});
console.log(response.headers.get('X-RateLimit-Limit')); // "5000"
console.log(response.headers.get('X-RateLimit-Remaining')); // "4999"
console.log(response.headers.get('X-RateLimit-Reset')); // "1675890123"
Header Variations by Provider
Unfortunately, header names aren't fully standardized:
| Provider | Limit Header | Remaining Header | Reset Header |
|---|---|---|---|
| GitHub | X-RateLimit-Limit |
X-RateLimit-Remaining |
X-RateLimit-Reset |
x-rate-limit-limit |
x-rate-limit-remaining |
x-rate-limit-reset |
|
| Stripe | X-Stripe-Limit |
X-Stripe-Remaining |
X-Stripe-Reset |
| OpenAI | x-ratelimit-limit-requests |
x-ratelimit-remaining-requests |
x-ratelimit-reset-requests |
Parsing example:
function parseRateLimitHeaders(headers) {
// Try common header variations
const limit = parseInt(
headers.get('X-RateLimit-Limit') ||
headers.get('x-rate-limit-limit') ||
headers.get('X-Stripe-Limit') || '0'
);
const remaining = parseInt(
headers.get('X-RateLimit-Remaining') ||
headers.get('x-rate-limit-remaining') ||
headers.get('X-Stripe-Remaining') || '0'
);
const reset = parseInt(
headers.get('X-RateLimit-Reset') ||
headers.get('x-rate-limit-reset') ||
headers.get('X-Stripe-Reset') || '0'
);
return { limit, remaining, reset };
}
The Retry-After Header
When you receive a 429 response, the Retry-After header tells you exactly how long to wait:
async function handleRateLimit(response) {
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
if (retryAfter) {
// Can be seconds (number) or HTTP date (string)
const waitSeconds = parseInt(retryAfter) ||
(Date.parse(retryAfter) - Date.now()) / 1000;
console.log(`Rate limited. Waiting ${waitSeconds} seconds...`);
await new Promise(r => setTimeout(r, waitSeconds * 1000));
// Retry the request
return fetch(response.url, response.options);
}
}
return response;
}
Proactive Monitoring
Don't wait for 429 errors. Monitor your rate limit consumption proactively:
class RateLimitTracker {
constructor() {
this.limits = {};
}
track(apiName, headers) {
const { limit, remaining, reset } = parseRateLimitHeaders(headers);
this.limits[apiName] = { limit, remaining, reset };
// Alert if getting close to limit
const percentUsed = ((limit - remaining) / limit) * 100;
if (percentUsed > 80) {
console.warn(
`[${apiName}] Rate limit warning: ${percentUsed.toFixed(1)}% used ` +
`(${remaining}/${limit} remaining)`
);
}
if (percentUsed > 95) {
console.error(
`[${apiName}] CRITICAL: Rate limit nearly exhausted! ` +
`${remaining} requests remaining until ${new Date(reset * 1000)}`
);
}
}
getStatus(apiName) {
return this.limits[apiName] || null;
}
}
const tracker = new RateLimitTracker();
// Use with every API call
const response = await fetch('https://api.example.com/data');
tracker.track('example-api', response.headers);
Detecting Rate Limits Without Headers
Some older APIs don't provide rate limit headers. In these cases, detect 429s by status code:
async function makeRequest(url) {
const response = await fetch(url);
// Check for rate limiting
if (response.status === 429) {
// Fallback: exponential backoff without Retry-After
const waitTime = Math.pow(2, retryCount) * 1000;
console.log(`Rate limited. Backing off ${waitTime}ms`);
await new Promise(r => setTimeout(r, waitTime));
return makeRequest(url); // Retry
}
return response;
}
Rate Limit Handling Strategies with Code Examples
Now that you can detect rate limiting, let's implement robust handling strategies. These patterns work across languages and API providers.
1. Exponential Backoff
The gold standard for retry logic. When a request fails, wait progressively longer between retries.
JavaScript/Node.js Implementation:
async function exponentialBackoff(
fn,
maxRetries = 5,
baseDelay = 1000,
maxDelay = 32000
) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
// Only retry on rate limits or transient errors
if (
error.status !== 429 &&
error.status !== 503 &&
error.status !== 502
) {
throw error; // Don't retry 4xx client errors
}
if (attempt === maxRetries - 1) {
throw error; // Exhausted retries
}
// Calculate delay: baseDelay * 2^attempt + jitter
const exponentialDelay = Math.min(
baseDelay * Math.pow(2, attempt),
maxDelay
);
// Add jitter to prevent thundering herd
const jitter = Math.random() * 1000;
const totalDelay = exponentialDelay + jitter;
console.log(
`Attempt ${attempt + 1} failed. ` +
`Retrying in ${(totalDelay / 1000).toFixed(2)}s...`
);
await new Promise(resolve => setTimeout(resolve, totalDelay));
}
}
}
// Usage
const data = await exponentialBackoff(async () => {
const response = await fetch('https://api.stripe.com/v1/charges', {
method: 'POST',
headers: {
'Authorization': `Bearer ${STRIPE_SECRET_KEY}`,
'Content-Type': 'application/x-www-form-urlencoded'
},
body: 'amount=2000¤cy=usd&source=tok_visa'
});
if (!response.ok) {
const error = new Error('API request failed');
error.status = response.status;
throw error;
}
return response.json();
});
Python Implementation:
import time
import random
from typing import Callable, Any
def exponential_backoff(
fn: Callable[[], Any],
max_retries: int = 5,
base_delay: float = 1.0,
max_delay: float = 32.0
) -> Any:
"""Execute function with exponential backoff retry logic."""
for attempt in range(max_retries):
try:
return fn()
except Exception as error:
# Only retry on rate limits or transient errors
status = getattr(error, 'status', None)
if status not in [429, 502, 503]:
raise error # Don't retry client errors
if attempt == max_retries - 1:
raise error # Exhausted retries
# Calculate delay: base_delay * 2^attempt + jitter
exponential_delay = min(
base_delay * (2 ** attempt),
max_delay
)
# Add jitter to prevent thundering herd
jitter = random.uniform(0, 1)
total_delay = exponential_delay + jitter
print(
f"Attempt {attempt + 1} failed. "
f"Retrying in {total_delay:.2f}s..."
)
time.sleep(total_delay)
# Usage
import stripe
stripe.api_key = "sk_test_..."
data = exponential_backoff(
lambda: stripe.Charge.create(
amount=2000,
currency="usd",
source="tok_visa"
)
)
Why jitter matters: Without jitter, all clients hitting rate limits retry at exactly the same time, creating synchronized thundering herd problems that make recovery harder.
2. Request Queuing
Instead of firing requests immediately, queue them and process at a controlled rate.
JavaScript Implementation with bottleneck:
const Bottleneck = require('bottleneck');
// Configure limiter: max 100 requests per minute
const limiter = new Bottleneck({
maxConcurrent: 10, // Max 10 simultaneous requests
minTime: 600, // Minimum 600ms between requests (= 100/min)
reservoir: 100, // Start with 100 requests available
reservoirRefreshAmount: 100, // Add 100 requests
reservoirRefreshInterval: 60 * 1000 // Every 60 seconds
});
// Handle 429 responses
limiter.on('failed', async (error, jobInfo) => {
if (error.status === 429) {
const retryAfter = error.retryAfter || 60;
console.log(`Rate limited. Retrying after ${retryAfter}s...`);
return retryAfter * 1000; // Tell bottleneck when to retry
}
});
// Wrap your API calls
const fetchUser = limiter.wrap(async (userId) => {
const response = await fetch(`https://api.example.com/users/${userId}`);
if (response.status === 429) {
const error = new Error('Rate limited');
error.status = 429;
error.retryAfter = parseInt(response.headers.get('Retry-After')) || 60;
throw error;
}
return response.json();
});
// Use like normal async function
const user = await fetchUser('user_123');
// Process many items - automatically queued and rate-limited
const users = await Promise.all(
userIds.map(id => fetchUser(id))
);
Python Implementation with ratelimit:
from ratelimit import limits, sleep_and_retry
import requests
# Max 100 calls per minute
@sleep_and_retry
@limits(calls=100, period=60)
def fetch_user(user_id: str) -> dict:
"""Fetch user data with automatic rate limiting."""
response = requests.get(
f"https://api.example.com/users/{user_id}",
headers={"Authorization": f"Bearer {API_KEY}"}
)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
print(f"Rate limited. Sleeping {retry_after}s...")
time.sleep(retry_after)
return fetch_user(user_id) # Retry
response.raise_for_status()
return response.json()
# Usage - automatically rate-limited
users = [fetch_user(user_id) for user_id in user_ids]
3. Caching Responses
The best API call is one you don't make. Aggressive caching reduces rate limit pressure.
JavaScript Implementation:
const NodeCache = require('node-cache');
class CachedAPIClient {
constructor(ttl = 300) { // Default 5-minute TTL
this.cache = new NodeCache({
stdTTL: ttl,
checkperiod: 60
});
}
async get(endpoint, options = {}) {
const cacheKey = this._getCacheKey(endpoint, options);
// Check cache first
const cached = this.cache.get(cacheKey);
if (cached) {
console.log(`[CACHE HIT] ${endpoint}`);
return cached;
}
console.log(`[CACHE MISS] ${endpoint}`);
// Fetch from API
const response = await fetch(endpoint, options);
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
const data = await response.json();
// Store in cache
this.cache.set(cacheKey, data);
return data;
}
_getCacheKey(endpoint, options) {
return `${endpoint}:${JSON.stringify(options)}`;
}
invalidate(endpoint) {
const keys = this.cache.keys();
keys.forEach(key => {
if (key.startsWith(endpoint)) {
this.cache.del(key);
}
});
}
}
// Usage
const api = new CachedAPIClient(300); // 5-minute cache
// First call hits API
const user1 = await api.get('https://api.github.com/users/octocat');
// Second call within 5 minutes uses cache
const user2 = await api.get('https://api.github.com/users/octocat');
// Invalidate when data changes
api.invalidate('https://api.github.com/users/octocat');
Python with functools.lru_cache:
from functools import lru_cache
import requests
import hashlib
import json
class CachedAPIClient:
def __init__(self, ttl: int = 300):
self.ttl = ttl
self._cache = {}
def get(self, endpoint: str, params: dict = None) -> dict:
"""Fetch data with caching."""
cache_key = self._get_cache_key(endpoint, params)
# Check cache
if cache_key in self._cache:
cached_data, timestamp = self._cache[cache_key]
if time.time() - timestamp < self.ttl:
print(f"[CACHE HIT] {endpoint}")
return cached_data
print(f"[CACHE MISS] {endpoint}")
# Fetch from API
response = requests.get(endpoint, params=params)
response.raise_for_status()
data = response.json()
# Store in cache
self._cache[cache_key] = (data, time.time())
return data
def _get_cache_key(self, endpoint: str, params: dict) -> str:
key_str = f"{endpoint}:{json.dumps(params, sort_keys=True)}"
return hashlib.md5(key_str.encode()).hexdigest()
# Usage
api = CachedAPIClient(ttl=300)
# First call hits API
user1 = api.get('https://api.github.com/users/octocat')
# Second call uses cache
user2 = api.get('https://api.github.com/users/octocat')
Cache invalidation strategies:
- Time-based (TTL): Expire after N seconds/minutes
- Event-based: Invalidate when source data changes
- LRU (Least Recently Used): Evict oldest entries when cache is full
- Conditional requests: Use
ETag/If-None-Matchheaders to revalidate
4. Circuit Breaker Pattern
When an API is consistently failing, stop making requests temporarily to avoid wasting rate limit quota and creating cascading failures.
JavaScript Implementation:
class CircuitBreaker {
constructor(options = {}) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeout = options.resetTimeout || 60000; // 1 minute
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.failureCount = 0;
this.nextAttempt = Date.now();
}
async execute(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error(
'Circuit breaker is OPEN. ' +
`Retry after ${new Date(this.nextAttempt)}`
);
}
// Transition to HALF_OPEN to test if service recovered
this.state = 'HALF_OPEN';
console.log('Circuit breaker transitioning to HALF_OPEN');
}
try {
const result = await fn();
this._onSuccess();
return result;
} catch (error) {
this._onFailure();
throw error;
}
}
_onSuccess() {
this.failureCount = 0;
if (this.state === 'HALF_OPEN') {
console.log('Circuit breaker closing - service recovered');
this.state = 'CLOSED';
}
}
_onFailure() {
this.failureCount++;
if (this.failureCount >= this.failureThreshold) {
console.error(
`Circuit breaker opening after ${this.failureCount} failures`
);
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.resetTimeout;
}
}
getState() {
return {
state: this.state,
failureCount: this.failureCount,
nextAttempt: this.nextAttempt
};
}
}
// Usage
const breaker = new CircuitBreaker({
failureThreshold: 5, // Open after 5 failures
resetTimeout: 60000 // Try again after 1 minute
});
async function fetchData() {
return breaker.execute(async () => {
const response = await fetch('https://api.example.com/data');
if (response.status === 429 || response.status >= 500) {
throw new Error(`API error: ${response.status}`);
}
return response.json();
});
}
// Automatically stops calling failing API
try {
const data = await fetchData();
} catch (error) {
console.log('Request failed:', error.message);
console.log('Circuit breaker state:', breaker.getState());
}
Python Implementation:
import time
from enum import Enum
from typing import Callable, Any
class CircuitState(Enum):
CLOSED = "CLOSED"
OPEN = "OPEN"
HALF_OPEN = "HALF_OPEN"
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
reset_timeout: int = 60
):
self.failure_threshold = failure_threshold
self.reset_timeout = reset_timeout
self.state = CircuitState.CLOSED
self.failure_count = 0
self.next_attempt = time.time()
def execute(self, fn: Callable[[], Any]) -> Any:
"""Execute function with circuit breaker protection."""
if self.state == CircuitState.OPEN:
if time.time() < self.next_attempt:
raise Exception(
f"Circuit breaker is OPEN. "
f"Retry after {self.next_attempt - time.time():.0f}s"
)
# Transition to HALF_OPEN
self.state = CircuitState.HALF_OPEN
print("Circuit breaker transitioning to HALF_OPEN")
try:
result = fn()
self._on_success()
return result
except Exception as error:
self._on_failure()
raise error
def _on_success(self):
self.failure_count = 0
if self.state == CircuitState.HALF_OPEN:
print("Circuit breaker closing - service recovered")
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
if self.failure_count >= self.failure_threshold:
print(
f"Circuit breaker opening after "
f"{self.failure_count} failures"
)
self.state = CircuitState.OPEN
self.next_attempt = time.time() + self.reset_timeout
# Usage
breaker = CircuitBreaker(failure_threshold=5, reset_timeout=60)
def fetch_data():
return breaker.execute(lambda: requests.get('https://api.example.com/data'))
When to use circuit breakers:
- API is experiencing an outage (returns 5xx errors)
- Consistent rate limiting despite proper backoff
- Network connectivity issues
- Dependency failures that won't resolve immediately
Rate Limits by Popular APIs
Understanding specific rate limits for the APIs you use helps you design within constraints.
Stripe
Rate limits:
- Default: 100 requests per second per account
- Read operations: Higher limits (list endpoints)
- Write operations: Lower limits (charge creation)
Best practices:
- Use idempotency keys to safely retry
- Implement webhooks instead of polling
- Use
expandparameter to fetch related objects in one request
Monitoring: Check Stripe API status
// Stripe-specific rate limit handling
const stripe = require('stripe')(process.env.STRIPE_SECRET_KEY);
async function createChargeWithRetry(amount, source) {
return exponentialBackoff(async () => {
try {
return await stripe.charges.create({
amount,
currency: 'usd',
source
}, {
idempotencyKey: `charge_${Date.now()}_${Math.random()}`
});
} catch (error) {
if (error.type === 'StripeRateLimitError') {
const rateLimitError = new Error('Rate limited');
rateLimitError.status = 429;
throw rateLimitError;
}
throw error;
}
});
}
OpenAI
Rate limits (GPT-4):
- Free tier: 3 RPM (requests per minute)
- Tier 1: 500 RPM, 30,000 TPM (tokens per minute)
- Tier 5: 10,000 RPM, 300,000 TPM
Headers:
x-ratelimit-limit-requestsx-ratelimit-remaining-requestsx-ratelimit-limit-tokensx-ratelimit-remaining-tokens
Best practices:
- Track token consumption, not just request count
- Use smaller models (GPT-3.5) for less critical tasks
- Implement request batching where possible
Monitoring: Check OpenAI API status
import openai
import time
def chat_with_retry(messages, max_retries=3):
"""OpenAI chat completion with rate limit handling."""
for attempt in range(max_retries):
try:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=messages
)
return response
except openai.error.RateLimitError as e:
if attempt == max_retries - 1:
raise
# Extract wait time from error message if available
wait_time = 20 * (attempt + 1) # Progressive backoff
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
GitHub
Rate limits:
- Authenticated: 5,000 requests per hour
- Unauthenticated: 60 requests per hour
- Search API: 30 requests per minute
- GraphQL API: 5,000 points per hour
Headers:
X-RateLimit-LimitX-RateLimit-RemainingX-RateLimit-ResetX-RateLimit-Used
Best practices:
- Always authenticate to get 5,000 requests/hour
- Use GraphQL to fetch exactly the data you need
- Use conditional requests with
ETagheaders
Monitoring: Check GitHub API status
// GitHub API rate limit checking
async function checkGitHubRateLimit() {
const response = await fetch('https://api.github.com/rate_limit', {
headers: {
'Authorization': `token ${GITHUB_TOKEN}`
}
});
const data = await response.json();
console.log('Core API:', data.resources.core);
console.log('Search API:', data.resources.search);
console.log('GraphQL API:', data.resources.graphql);
return data;
}
Twilio
Rate limits:
- Default: 1,000 requests per second (burst)
- Sustained: Varies by account age/usage
- Concurrent requests: 100 per account
Rate limit codes:
20429- Too Many Requests
Best practices:
- Implement queuing for SMS campaigns
- Use message batching via Messaging Services
- Monitor
X-Shenanigans-Detectedheader
from twilio.rest import Client
import time
client = Client(account_sid, auth_token)
def send_sms_with_rate_limit(to, body):
"""Send SMS with rate limit handling."""
max_retries = 3
for attempt in range(max_retries):
try:
message = client.messages.create(
to=to,
from_=twilio_number,
body=body
)
return message.sid
except TwilioRestException as e:
if e.code == 20429: # Rate limited
wait_time = 5 * (attempt + 1)
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
raise
Twitter (X) API
Rate limits (v2):
- Tweet lookup: 300 requests per 15-minute window
- User lookup: 300 requests per 15-minute window
- Search tweets: 450 requests per 15-minute window
Headers:
x-rate-limit-limitx-rate-limit-remainingx-rate-limit-reset
Best practices:
- Use tweet IDs instead of searching repeatedly
- Cache user data aggressively
- Implement 15-minute window tracking
Building Rate-Limit-Aware Applications
Rate limit handling should be built into your application architecture from day one, not bolted on after hitting limits.
Architecture Patterns
1. Centralized Rate Limiter Service
For distributed applications, implement a shared rate limiter using Redis:
const Redis = require('ioredis');
const redis = new Redis();
class DistributedRateLimiter {
async checkLimit(apiName, limit, windowSeconds) {
const key = `ratelimit:${apiName}`;
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
// Remove old entries
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
const count = await redis.zcard(key);
if (count >= limit) {
const oldestEntry = await redis.zrange(key, 0, 0, 'WITHSCORES');
const resetTime = parseInt(oldestEntry[1]) + (windowSeconds * 1000);
return {
allowed: false,
resetAt: resetTime
};
}
// Add current request
await redis.zadd(key, now, `${now}:${Math.random()}`);
await redis.expire(key, windowSeconds);
return {
allowed: true,
remaining: limit - count - 1
};
}
}
// Usage across multiple servers
const limiter = new DistributedRateLimiter();
app.post('/api/action', async (req, res) => {
const result = await limiter.checkLimit('openai-api', 100, 60);
if (!result.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
resetAt: result.resetAt
});
}
// Process request
const data = await callOpenAI(req.body);
res.json(data);
});
2. Rate Limit Middleware
Wrap API clients with middleware that automatically handles rate limiting:
class RateLimitedAPIClient {
constructor(apiClient, rateLimit) {
this.client = apiClient;
this.limiter = new Bottleneck({
maxConcurrent: rateLimit.concurrent || 10,
minTime: (60 * 1000) / rateLimit.requestsPerMinute
});
}
async request(method, endpoint, data) {
return this.limiter.schedule(async () => {
return exponentialBackoff(async () => {
const response = await this.client.request(method, endpoint, data);
if (response.status === 429) {
const error = new Error('Rate limited');
error.status = 429;
error.retryAfter = parseInt(
response.headers.get('Retry-After') || 60
);
throw error;
}
return response;
});
});
}
}
// Usage
const openai = new RateLimitedAPIClient(openaiClient, {
requestsPerMinute: 50,
concurrent: 5
});
// All requests automatically rate-limited
const completion = await openai.request('POST', '/v1/chat/completions', {
model: 'gpt-4',
messages: [{role: 'user', content: 'Hello!'}]
});
3. Background Job Queues
For non-time-sensitive operations, use job queues to smooth traffic:
const Bull = require('bull');
// Create queue
const apiQueue = new Bull('api-requests', {
redis: { host: 'localhost', port: 6379 }
});
// Process queue with rate limiting
apiQueue.process(5, async (job) => { // Max 5 concurrent
const { endpoint, data } = job.data;
return await exponentialBackoff(async () => {
return await fetch(endpoint, {
method: 'POST',
body: JSON.stringify(data)
});
});
});
// Limit processing rate
apiQueue.on('completed', (job) => {
// Wait 1 second between jobs (3,600 requests/hour)
return new Promise(r => setTimeout(r, 1000));
});
// Add jobs to queue
await apiQueue.add({
endpoint: 'https://api.example.com/data',
data: { message: 'Hello' }
});
User-Facing Rate Limit Communication
When your application hits rate limits, communicate clearly with users:
Good error message:
{
"error": "rate_limit_exceeded",
"message": "You've made too many requests. Please try again in 2 minutes.",
"retry_after": 120,
"limit": 100,
"period": "hour",
"docs_url": "https://docs.example.com/rate-limits"
}
Bad error message:
{
"error": "Too many requests"
}
UI considerations:
- Show progress bars for bulk operations
- Display "X requests remaining this hour"
- Gracefully queue actions when near limits
- Offer upgrade prompts when consistently hitting free tier limits
Monitoring and Alerting
Set up comprehensive monitoring:
// Track rate limit metrics
class RateLimitMetrics {
constructor(metricsClient) {
this.metrics = metricsClient;
}
recordAPICall(apiName, headers) {
const { limit, remaining } = parseRateLimitHeaders(headers);
const percentUsed = ((limit - remaining) / limit) * 100;
// Send to monitoring service (DataDog, CloudWatch, etc.)
this.metrics.gauge(`api.rate_limit.${apiName}.remaining`, remaining);
this.metrics.gauge(`api.rate_limit.${apiName}.percent_used`, percentUsed);
// Alert if consistently above 80%
if (percentUsed > 80) {
this.metrics.event({
title: `High rate limit usage: ${apiName}`,
text: `${percentUsed.toFixed(1)}% of rate limit used`,
alert_type: 'warning',
tags: [`api:${apiName}`]
});
}
}
recordRateLimitError(apiName, retryAfter) {
this.metrics.increment(`api.rate_limit.${apiName}.errors`);
this.metrics.gauge(`api.rate_limit.${apiName}.retry_after`, retryAfter);
}
}
When Rate Limits Indicate an Outage
Sometimes what appears to be rate limiting is actually an API outage in disguise.
Distinguishing Rate Limits from Outages
Normal rate limiting:
- ✓ Predictable based on your usage patterns
- ✓ Affects only your account
- ✓ Resolves after waiting the specified time
- ✓ Headers present and accurate
- ✓ Error messages are clear
Possible outage:
- ✗ Sudden 429s when well within normal limits
- ✗ No
Retry-Afterheader or incorrect value - ✗ Affects all endpoints simultaneously
- ✗ Accompanied by 5xx errors
- ✗ Multiple users reporting issues on social media
Cross-Reference with Status Monitoring
Before assuming you're being rate-limited unfairly, check if the API is experiencing problems:
Quick status check workflow:
Check API Status Check - Real-time monitoring for 100+ APIs
Check official status pages:
- Stripe: status.stripe.com
- OpenAI: status.openai.com
- GitHub: githubstatus.com
Search social media:
- Twitter/X: Search "Stripe down" or "@stripeapi"
- Hacker News, Reddit r/webdev
Test from different locations:
- Regional outages may only affect certain data centers
- Use a VPN or cloud function in another region
Automated Outage Detection
Build outage detection into your monitoring:
async function detectPossibleOutage(apiName, errorRate) {
// If error rate suddenly spikes above normal
if (errorRate > 0.5) { // 50% of requests failing
// Check API Status Check
const status = await fetch(
`https://apistatuscheck.com/api/${apiName}/status`
).then(r => r.json());
if (status.operational === false) {
// Confirmed outage
await notifyTeam({
title: `${apiName} Outage Detected`,
message: `${apiName} is experiencing issues. See: https://apistatuscheck.com/api/${apiName}`,
severity: 'critical'
});
// Switch to degraded mode
await enableFallbackMode(apiName);
return true;
}
}
return false;
}
Benefits of status monitoring:
- Faster incident response - Know within seconds
- Better customer communication - "We're aware of issues with [Provider]"
- Avoid wasted debugging time - Don't troubleshoot when provider is down
- Historical data - Track provider reliability over time
Set up alerts for your critical APIs →
Frequently Asked Questions
What's the difference between rate limiting and throttling?
Rate limiting sets a hard cap on requests per time period—exceed it and you get a 429 error. Throttling slows down requests gracefully by adding artificial delays or queuing, keeping you under the limit. Rate limiting is reactive (API rejects excess), throttling is proactive (client self-regulates).
Should I implement client-side rate limiting even if the API has limits?
Yes! Client-side rate limiting ("throttling") is best practice because:
- You avoid hitting limits and disrupting service
- No wasted API calls that return 429s
- More predictable application behavior
- Better resource utilization (no retry storms)
- Can coordinate limits across multiple services/instances
Think of API rate limits as a guardrail, not a target to hit.
How do I handle rate limits in serverless functions?
Serverless adds complexity because instances don't share state. Solutions:
- Use external rate limiting (Redis, DynamoDB) to track limits across invocations
- Pre-allocate request quotas - Each lambda gets 1/N of hourly limit
- Implement queuing - SQS/SNS to serialize requests
- Use AWS API Gateway throttling - Built-in per-key limits
Avoid storing rate limit state in-memory since lambdas are ephemeral.
Can I negotiate higher rate limits with API providers?
Yes! Options include:
- Upgrade to paid tier - Usually instant limit increase
- Contact enterprise sales - Custom limits for high-volume users
- Show business case - Explain why you need higher limits
- Demonstrate good citizenship - Efficient API usage, proper error handling
Providers want to support legitimate high-volume users—don't hesitate to ask.
What's a good rate limit for my own API?
Start with conservative defaults and adjust based on monitoring:
- Public free tier: 100-1,000 requests/hour
- Authenticated users: 5,000-10,000 requests/hour
- Paid tiers: 10,000-100,000+ requests/hour
Consider:
- Your infrastructure capacity (don't promise what you can't deliver)
- Cost per request (databases, compute, third-party APIs)
- Typical use cases (background sync needs more than interactive apps)
- Competitive landscape (what do similar APIs offer?)
Implement gradually: Start restrictive, loosen as you scale.
How do I test rate limit handling in development?
Mock rate limits:
// Mock API client that simulates rate limiting
class MockAPIClient {
constructor(requestsPerMinute) {
this.limit = requestsPerMinute;
this.requests = [];
}
async request(endpoint) {
const now = Date.now();
const oneMinuteAgo = now - 60000;
// Remove old requests
this.requests = this.requests.filter(t => t > oneMinuteAgo);
if (this.requests.length >= this.limit) {
const error = new Error('Rate limited');
error.status = 429;
throw error;
}
this.requests.push(now);
return { data: 'success' };
}
}
// Test your retry logic
const mockAPI = new MockAPIClient(10); // 10 requests/minute
// This should trigger rate limiting and retry
for (let i = 0; i < 20; i++) {
await fetchWithRetry(() => mockAPI.request('/test'));
}
Use dedicated test environments with lower limits if providers offer them.
What happens if I ignore rate limits?
Consequences escalate:
- 429 errors - Requests rejected, application breaks
- Longer rate limits - Punishment for abuse (hours instead of minutes)
- API key suspension - Temporary or permanent ban
- Account termination - Lose access entirely
- IP blocking - Affects all applications from your infrastructure
- Legal action - Violations of Terms of Service
Don't risk it. Respect rate limits.
Are rate limit headers standardized?
Unfortunately no. While many APIs follow the X-RateLimit-* pattern popularized by Twitter and GitHub, there's no official standard. Some providers use:
RateLimit-*(draft IETF standard)X-Rate-Limit-*X-RateLimit-*- Provider-specific headers (
X-Shopify-Shop-Api-Call-Limit)
Always check the API documentation and write defensive parsing code that handles variations.
Can rate limits differ by region or time of day?
Yes, some APIs implement dynamic rate limiting:
- Regional limits - EU data centers may have different caps
- Time-based limits - Higher limits during off-peak hours
- Burst allowances - Temporary increases for legitimate spikes
- Adaptive limits - Machine learning adjusts based on system load
Check documentation and monitor actual limits via response headers rather than assuming fixed values.
How do I handle rate limits in webhooks?
Webhooks create unique challenges since you don't control the request rate:
Strategies:
- Queue webhook events - Don't process synchronously
- Acknowledge quickly - Return 200 immediately, process async
- Batch processing - Group related events
- Use webhook replay - If you miss events due to limits, replay from provider
Example:
app.post('/webhook', async (req, res) => {
// Acknowledge immediately
res.status(200).send('OK');
// Queue for processing
await webhookQueue.add({
event: req.body,
signature: req.headers['stripe-signature']
});
});
// Process queue with rate limiting
webhookQueue.process(async (job) => {
const { event } = job.data;
// Make API calls with rate limit handling
await processWebhookEvent(event);
});
Take Control of Your API Infrastructure
API rate limiting doesn't have to be a source of anxiety and downtime. With the strategies and code examples in this guide, you can build resilient applications that gracefully handle rate limits and deliver reliable experiences to your users.
Key takeaways:
✓ Monitor proactively - Track rate limit headers before hitting limits
✓ Implement exponential backoff - Retry intelligently, not aggressively
✓ Cache aggressively - The best API call is one you don't make
✓ Queue non-urgent requests - Smooth out traffic spikes
✓ Use circuit breakers - Fail fast when APIs are struggling
✓ Know when to upgrade - Don't outgrow your API tier
✓ Distinguish outages from rate limits - Check status monitoring
Monitor API Health in Real-Time
Don't wait for user complaints to discover API problems. API Status Check monitors 100+ APIs 24/7 and alerts you instantly when issues arise—including unexpected rate limiting that may indicate an outage.
Track the APIs you depend on:
Get alerted via:
- 💬 Slack
- 🔔 Discord
- 🪝 Custom webhooks
Start monitoring your APIs for free →
Last updated: February 4, 2026. Rate limit information is based on public API documentation and is subject to change. Always refer to official provider documentation for the most current limits.
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status →