API Rate Limiting Cheat Sheet: Headers, Patterns & Best Practices
API Rate Limiting Cheat Sheet: Headers, Patterns & Best Practices
Quick Answer: API rate limiting controls how many requests a client can make to an API within a specific time window (e.g., 100 requests per minute). Essential response headers include X-RateLimit-Limit (max requests), X-RateLimit-Remaining (requests left), and Retry-After (seconds until reset). Implement exponential backoff when hitting 429 errors and monitor rate limit consumption to avoid service disruption.
Rate limiting is the invisible traffic cop of the API world—when implemented correctly, you'll never notice it's there. When misunderstood or ignored, it can bring your application to a grinding halt at the worst possible moment. Whether you're integrating with Stripe's payment APIs, GitHub's webhooks, or OpenAI's GPT models, understanding rate limits is non-negotiable for production applications.
This comprehensive reference guide covers everything developers need to know about API rate limiting: how it works, common implementation patterns, rate limits for major APIs, code examples for handling limits gracefully, and monitoring strategies to prevent surprises.
What is API Rate Limiting?
API rate limiting is a technique used by API providers to control the number of requests a client (user, application, or IP address) can make within a specified time window. It serves multiple critical purposes:
Performance Protection: Prevents any single client from monopolizing server resources and degrading performance for other users. Without rate limits, a buggy integration making thousands of requests per second could bring down an entire service.
Cost Control: API calls consume computational resources (database queries, processing time, bandwidth). Rate limiting helps providers manage infrastructure costs and prevent abuse that could result in unexpected billing spikes for serverless architectures.
Security Defense: Acts as a first line of defense against denial-of-service (DoS) attacks, credential stuffing attempts, and malicious bots attempting to scrape data or exploit vulnerabilities.
Fair Usage Enforcement: Ensures equitable access to shared resources across all customers. Enterprise plans typically get higher limits than free tiers, creating a natural upgrade path.
Data Integrity: For APIs that modify data (POST, PUT, DELETE), rate limiting prevents accidental duplicate operations caused by retry storms or misconfigured automation.
Common Rate Limit HTTP Headers
When you make an API request, most providers include standardized headers in the response that tell you about your current rate limit status. Understanding these headers is crucial for implementing proactive rate limit handling.
Standard Rate Limit Headers
| Header Name | Description | Example Value |
|---|---|---|
X-RateLimit-Limit |
Maximum requests allowed in the current window | 5000 |
X-RateLimit-Remaining |
Number of requests remaining in current window | 4273 |
X-RateLimit-Reset |
Unix timestamp when the limit resets | 1738627200 |
X-RateLimit-Used |
Number of requests consumed in current window | 727 |
Retry-After |
Seconds to wait before retrying (sent with 429 errors) | 45 |
RateLimit-Policy |
Describes the rate limit policy in structured format | 100;w=60 |
Parsing Rate Limit Headers in Code
JavaScript/Node.js:
async function makeAPIRequest(url, options) {
const response = await fetch(url, options);
const limit = parseInt(response.headers.get('X-RateLimit-Limit'));
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
const reset = parseInt(response.headers.get('X-RateLimit-Reset'));
console.log(`Rate limit: ${remaining}/${limit} remaining, resets at ${new Date(reset * 1000)}`);
// Proactive throttling when approaching limit
if (remaining < limit * 0.1) {
console.warn('⚠️ Approaching rate limit, implementing backoff');
await sleep(1000);
}
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
throw new RateLimitError(`Rate limited. Retry after ${retryAfter}s`, retryAfter);
}
return response.json();
}
Python:
import requests
import time
from datetime import datetime
def make_api_request(url, headers=None):
response = requests.get(url, headers=headers)
# Extract rate limit info
limit = int(response.headers.get('X-RateLimit-Limit', 0))
remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
reset = int(response.headers.get('X-RateLimit-Reset', 0))
reset_time = datetime.fromtimestamp(reset)
print(f"Rate limit: {remaining}/{limit} remaining, resets at {reset_time}")
# Proactive throttling
if remaining < limit * 0.1:
print("⚠️ Approaching rate limit, slowing down")
time.sleep(1)
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 60))
raise RateLimitException(f"Rate limited. Retry after {retry_after}s", retry_after)
return response.json()
The 429 Status Code
When you exceed a rate limit, APIs return a 429 Too Many Requests status code. This is your signal to stop making requests and wait. Never ignore 429 responses—continued requests will often result in temporary or permanent IP bans.
Proper 429 handling:
- Immediately stop making requests
- Check the
Retry-Afterheader for wait duration - Implement exponential backoff if
Retry-Afteris not provided - Log the incident for monitoring and analysis
- Alert your team if 429s become frequent
Rate Limits for Popular APIs
Understanding the specific rate limits for APIs you depend on is critical for capacity planning and avoiding production incidents. Here's a comprehensive reference for major API providers:
Payment & Financial APIs
| API | Free/Basic Tier | Standard Tier | Enterprise |
|---|---|---|---|
| Stripe | 100 req/sec (test) 25 req/sec (live) |
100 req/sec (live) | Custom (1000+ req/sec) |
| PayPal | 50 req/sec | 100 req/sec | Custom |
| Plaid | 10 req/sec 1000 req/hour |
50 req/sec | Custom |
| Square | 10 req/sec | 40 req/sec | Custom |
Stripe specifics:
- Rate limits apply per API key (test vs live keys have separate limits)
- Burst allowance: short bursts up to 200 req/sec tolerated
- Search API: Lower limits (20 req/sec)
- Connected accounts: Separate limit pools
Note: Always check Is Stripe Down if you're experiencing consistent API errors beyond rate limiting.
Communication & Messaging APIs
| API | Free/Basic Tier | Standard Tier | Notes |
|---|---|---|---|
| Twilio | 1 msg/sec (trial) | 100 msg/sec | Per account SID |
| SendGrid | 100 emails/day (free) | 100 req/sec | Based on plan tier |
| Slack | Tier 1: 1 req/min Tier 2: 20 req/min Tier 3: 50 req/min |
Tier 4: 100 req/min | Method-specific tiers |
| Discord | 50 req/sec | Same for all | Global and per-route limits |
SendGrid specifics:
- API calls and email sends have separate limits
- Marketing campaigns: 2,000 req/hour
- Email validation: 500 req/hour
Slack specifics:
- Different methods in different tiers
chat.postMessage: Tier 3 (50/min)users.list: Tier 2 (20/min)- Workspace token limits apply to entire workspace
Check Is SendGrid Down or Is Slack Down for real-time status.
Developer Platform APIs
| API | Authenticated | Unauthenticated | Special Limits |
|---|---|---|---|
| GitHub | 5,000 req/hour | 60 req/hour | Search: 30 req/min GraphQL: 5,000 points/hour |
| GitLab | 2,000 req/min | 10 req/min | Depends on plan tier |
| Bitbucket | 1,000 req/hour | 60 req/hour | Per OAuth consumer |
GitHub specifics:
- GraphQL API uses point system (each field costs points)
- Secondary rate limits for content creation (80 POST/PUT/DELETE per minute)
- Conditional requests (304 Not Modified) don't count against limit
- Enterprise Cloud: Higher limits available
AI & Machine Learning APIs
| API | Model | Requests Per Minute (RPM) | Tokens Per Minute (TPM) |
|---|---|---|---|
| OpenAI | GPT-4 (Free tier) | 3 RPM | 40,000 TPM |
| OpenAI | GPT-4 (Tier 1) | 500 RPM | 30,000 TPM |
| OpenAI | GPT-4 (Tier 5) | 10,000 RPM | 300,000,000 TPM |
| OpenAI | GPT-3.5-Turbo (Tier 1) | 3,500 RPM | 60,000 TPM |
| Anthropic | Claude (Free) | 5 RPM | 40,000 TPM |
| Anthropic | Claude (Pro) | 1,000 RPM | Varies by model |
OpenAI specifics:
- Tiered limits based on usage history and payment
- Both RPM (requests) and TPM (tokens) limits apply
- Batch API: Higher throughput, lower priority
- Different limits for different models
- Image generation: Separate limits (50 images/min for DALL-E 3)
E-commerce & Marketplace APIs
| API | Standard Limit | Notes |
|---|---|---|
| Shopify | 2 req/sec (REST) 1000 cost points/sec (GraphQL) |
Leaky bucket algorithm |
| WooCommerce | No official limit | Self-hosted, server-dependent |
| Amazon SP-API | Varies by endpoint | 1-200 req/sec depending on operation |
| eBay | 5,000 req/day (free) | Varies by API and tier |
Shopify specifics:
- Shopify Plus: 4 req/sec
- GraphQL uses cost calculation (each query has points)
- Bulk operations: Separate limits
- REST Admin API: 2 calls/sec sustained, bursts allowed
Monitor Is Shopify Down for platform-wide issues beyond rate limiting.
Cloud Infrastructure APIs
| Provider | Service | Rate Limit |
|---|---|---|
| AWS | API Gateway | 10,000 req/sec (default) |
| AWS | Lambda | 1,000 concurrent executions |
| AWS | DynamoDB | 40,000 RCU / 40,000 WCU per table |
| Google Cloud | Cloud Functions | 1,000 req/sec per function |
| Google Cloud | Firestore | 10,000 writes/sec per database |
| Azure | Functions | 200 concurrent instances (Consumption) |
Important: Cloud provider limits are often per-region and per-service. Always check specific service documentation and request limit increases through support tickets if needed.
Rate Limiting Implementation Patterns
API providers use various algorithms to implement rate limiting, each with different characteristics and use cases. Understanding these patterns helps you predict behavior and optimize your integration strategy.
1. Fixed Window
How it works: Divides time into fixed intervals (e.g., 1-minute windows). You get a fixed quota at the start of each window.
Example: 100 requests per minute, window resets at :00 seconds
Minute 1 (00:00-00:59): 100 requests available
Minute 2 (01:00-01:59): 100 requests available (resets at 01:00)
Pros:
- Simple to implement and understand
- Predictable reset times
- Low memory footprint
Cons:
- Burst vulnerability: User can make 200 requests in 2 seconds (100 at 00:59, 100 at 01:00)
- Cliff effect: Quota exhausted users must wait until window reset
- Uneven traffic distribution
Used by: GitHub (hourly window), many simple APIs
2. Sliding Window
How it works: Considers requests made in the past N time units from the current moment, providing smoother rate limiting.
Example: 100 requests per 60-second sliding window
At 12:30:45, checks all requests since 12:29:45
At 12:30:46, checks all requests since 12:29:46
Pros:
- Prevents burst exploitation
- Smoother rate limiting experience
- More accurate representation of "requests per time unit"
Cons:
- More complex implementation
- Higher memory usage (must track timestamps)
- Computationally more expensive
Used by: Stripe, Redis-based rate limiters
Implementation (Redis + Node.js):
const Redis = require('ioredis');
const redis = new Redis();
async function checkRateLimit(userId, limit = 100, windowSec = 60) {
const now = Date.now();
const windowStart = now - (windowSec * 1000);
const key = `ratelimit:${userId}`;
// Add current request
await redis.zadd(key, now, `${now}`);
// Remove old requests outside window
await redis.zremrangebyscore(key, '-inf', windowStart);
// Count requests in window
const count = await redis.zcard(key);
// Set expiration
await redis.expire(key, windowSec);
return {
allowed: count <= limit,
remaining: Math.max(0, limit - count),
resetAt: now + (windowSec * 1000)
};
}
3. Token Bucket
How it works: A bucket holds tokens (representing requests). Tokens are added at a fixed rate. Each request consumes one token. If bucket is empty, request is denied.
Example: Bucket capacity: 100 tokens, refill rate: 10 tokens/second
Initial: 100 tokens available
Make 20 requests: 80 tokens remain
Wait 5 seconds: 80 + (5 * 10) = 100 tokens (capped at bucket size)
Pros:
- Allows controlled bursts (up to bucket capacity)
- Smooth refill behavior
- Works well for varying traffic patterns
- Easy to reason about
Cons:
- Requires tracking state (bucket level, last refill time)
- Can be exploited with careful timing
- Bucket size tuning requires experimentation
Used by: AWS API Gateway, Shopify (leaky bucket variant), many enterprise APIs
Implementation (Python):
import time
import threading
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time.time()
self.lock = threading.Lock()
def consume(self, tokens=1):
with self.lock:
self._refill()
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
def _refill(self):
now = time.time()
elapsed = now - self.last_refill
tokens_to_add = elapsed * self.refill_rate
self.tokens = min(self.capacity, self.tokens + tokens_to_add)
self.last_refill = now
def get_status(self):
with self.lock:
self._refill()
return {
'tokens': self.tokens,
'capacity': self.capacity,
'remaining_percent': (self.tokens / self.capacity) * 100
}
# Usage
bucket = TokenBucket(capacity=100, refill_rate=10)
if bucket.consume(1):
print("Request allowed")
else:
print("Rate limited, please wait")
status = bucket.get_status()
print(f"Tokens remaining: {status['tokens']}/{status['capacity']}")
4. Leaky Bucket
How it works: Similar to token bucket, but enforces a constant output rate. Requests enter a queue (bucket) and are processed at a fixed rate. If bucket overflows, requests are rejected.
Example: Queue capacity: 100, processing rate: 10 requests/second
Burst of 50 requests arrives: All queued
Processing: 10 requests/sec drain from queue
Another 60 requests arrive: 10 queued, 50 rejected (overflow)
Pros:
- Smooths traffic spikes
- Protects downstream systems from bursts
- Predictable output rate
Cons:
- Can increase latency (queuing delay)
- Rejected requests during overflow
- Complex to implement correctly
Used by: Shopify (GraphQL cost calculation), network traffic shaping
Handling Rate Limits in Your Code
When you inevitably hit rate limits, how you respond determines whether you experience minor delays or complete service disruption. Here are battle-tested patterns for graceful rate limit handling.
Exponential Backoff with Jitter
The gold standard for retry logic. Wait progressively longer between attempts, with randomization to prevent thundering herd problems.
JavaScript/Node.js Implementation:
class RateLimitError extends Error {
constructor(message, retryAfter) {
super(message);
this.retryAfter = retryAfter;
this.name = 'RateLimitError';
}
}
async function exponentialBackoff(fn, maxRetries = 5) {
let retries = 0;
while (retries < maxRetries) {
try {
return await fn();
} catch (error) {
if (error.status === 429 || error.name === 'RateLimitError') {
retries++;
if (retries >= maxRetries) {
throw new Error(`Max retries (${maxRetries}) exceeded`);
}
// Calculate backoff: min(1000 * 2^retries, 32000) + random jitter
const baseDelay = Math.min(1000 * Math.pow(2, retries), 32000);
const jitter = Math.random() * 1000;
const delay = baseDelay + jitter;
console.log(`Rate limited. Retry ${retries}/${maxRetries} after ${delay.toFixed(0)}ms`);
// Respect Retry-After header if provided
const retryAfter = error.retryAfter ? error.retryAfter * 1000 : delay;
await sleep(retryAfter);
} else {
// Non-rate-limit error, throw immediately
throw error;
}
}
}
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
// Usage
async function fetchUserData(userId) {
return exponentialBackoff(async () => {
const response = await fetch(`https://api.example.com/users/${userId}`);
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '0');
throw new RateLimitError('Rate limit exceeded', retryAfter);
}
if (!response.ok) {
throw new Error(`API error: ${response.status}`);
}
return response.json();
});
}
Python Implementation with Decorators:
import time
import random
from functools import wraps
class RateLimitException(Exception):
def __init__(self, message, retry_after=None):
super().__init__(message)
self.retry_after = retry_after
def exponential_backoff(max_retries=5, base_delay=1.0):
"""Decorator for exponential backoff retry logic"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
retries = 0
while retries < max_retries:
try:
return func(*args, **kwargs)
except RateLimitException as e:
retries += 1
if retries >= max_retries:
raise Exception(f"Max retries ({max_retries}) exceeded")
# Calculate backoff with jitter
backoff = min(base_delay * (2 ** retries), 32)
jitter = random.uniform(0, 1)
delay = backoff + jitter
# Use Retry-After if provided
if e.retry_after:
delay = e.retry_after
print(f"Rate limited. Retry {retries}/{max_retries} after {delay:.2f}s")
time.sleep(delay)
raise Exception("Retry loop completed without success")
return wrapper
return decorator
# Usage
@exponential_backoff(max_retries=5)
def fetch_user_data(user_id):
import requests
response = requests.get(f'https://api.example.com/users/{user_id}')
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 0))
raise RateLimitException('Rate limit exceeded', retry_after)
response.raise_for_status()
return response.json()
# Make the call
try:
user = fetch_user_data(12345)
print(f"User: {user}")
except Exception as e:
print(f"Failed to fetch user: {e}")
Request Queuing and Rate Smoothing
For applications making many API calls, implement a queue that smooths requests to stay within limits proactively.
JavaScript Queue Manager:
class RateLimitedQueue {
constructor(requestsPerSecond) {
this.queue = [];
this.processing = false;
this.requestsPerSecond = requestsPerSecond;
this.delayBetweenRequests = 1000 / requestsPerSecond;
}
async enqueue(fn) {
return new Promise((resolve, reject) => {
this.queue.push({ fn, resolve, reject });
this.processQueue();
});
}
async processQueue() {
if (this.processing || this.queue.length === 0) {
return;
}
this.processing = true;
while (this.queue.length > 0) {
const { fn, resolve, reject } = this.queue.shift();
const startTime = Date.now();
try {
const result = await fn();
resolve(result);
} catch (error) {
reject(error);
}
// Enforce rate limit delay
const elapsed = Date.now() - startTime;
const remainingDelay = Math.max(0, this.delayBetweenRequests - elapsed);
if (this.queue.length > 0 && remainingDelay > 0) {
await new Promise(r => setTimeout(r, remainingDelay));
}
}
this.processing = false;
}
getQueueSize() {
return this.queue.length;
}
}
// Usage
const apiQueue = new RateLimitedQueue(10); // 10 requests per second
// Make 100 requests that will be automatically rate-limited
const promises = [];
for (let i = 0; i < 100; i++) {
const promise = apiQueue.enqueue(async () => {
const response = await fetch(`https://api.example.com/items/${i}`);
return response.json();
});
promises.push(promise);
}
console.log(`Queue size: ${apiQueue.getQueueSize()}`);
const results = await Promise.all(promises);
console.log(`Completed ${results.length} requests without hitting rate limits`);
Circuit Breaker Pattern
When an API consistently returns 429 errors, implement a circuit breaker to avoid wasting resources on requests that will fail.
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000) {
this.failureThreshold = threshold;
this.timeout = timeout;
this.failureCount = 0;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.nextAttempt = Date.now();
}
async execute(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN. Service temporarily unavailable.');
}
this.state = 'HALF_OPEN';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.timeout;
console.error(`Circuit breaker opened. Will retry after ${this.timeout}ms`);
}
}
getState() {
return {
state: this.state,
failures: this.failureCount,
nextAttempt: new Date(this.nextAttempt)
};
}
}
// Usage
const breaker = new CircuitBreaker(5, 60000);
async function makeAPICall() {
return breaker.execute(async () => {
const response = await fetch('https://api.example.com/data');
if (response.status === 429) {
throw new Error('Rate limited');
}
return response.json();
});
}
Monitoring Rate Limits with API Status Check
Proactive monitoring prevents rate limit surprises that can disrupt your production services. While implementing proper retry logic is essential, knowing before you hit limits allows you to scale gracefully.
Why Monitor Rate Limit Consumption?
Capacity Planning: Understanding your rate limit usage patterns helps you predict when you'll need to upgrade to a higher tier or implement request optimization.
Early Warning: Getting alerts when you reach 80% of your rate limit gives you time to investigate and optimize before hitting hard limits.
Incident Response: When troubleshooting API errors, knowing whether rate limits are involved saves hours of debugging time. Check status dashboards for Stripe, Twilio, Slack, and other critical APIs.
Cost Optimization: High rate limit consumption might indicate inefficient API usage (unnecessary requests, missing caching, duplicate calls) that's costing you money and performance.
Setting Up Rate Limit Monitoring
1. Log rate limit headers from every request:
const winston = require('winston');
const logger = winston.createLogger({
transports: [new winston.transports.File({ filename: 'api-metrics.log' })]
});
async function monitoredAPICall(url) {
const response = await fetch(url);
logger.info('API Request', {
url,
status: response.status,
rateLimit: {
limit: response.headers.get('X-RateLimit-Limit'),
remaining: response.headers.get('X-RateLimit-Remaining'),
reset: response.headers.get('X-RateLimit-Reset'),
percentUsed: (
(1 - (response.headers.get('X-RateLimit-Remaining') /
response.headers.get('X-RateLimit-Limit'))) * 100
).toFixed(2)
}
});
return response;
}
2. Track 429 errors with alerting:
const Sentry = require('@sentry/node');
if (response.status === 429) {
Sentry.captureException(new Error('Rate Limit Exceeded'), {
extra: {
api: 'stripe',
endpoint: url,
retryAfter: response.headers.get('Retry-After'),
rateLimitRemaining: response.headers.get('X-RateLimit-Remaining')
},
level: 'warning'
});
}
3. Use API Status Check for multi-API monitoring:
API Status Check monitors response times, error rates, and availability for 100+ popular APIs including:
- Stripe status - Payment processing
- SendGrid status - Email delivery
- Twilio status - SMS and voice
- Slack status - Team communication
- Shopify status - E-commerce platform
- Notion status - Workspace tools
- Heroku status - Cloud hosting
Get instant alerts when:
- API response times spike (potential rate limit throttling)
- Error rates increase (429s or 5xx errors)
- Complete outages occur (not just rate limits)
Start monitoring your critical APIs →
Rate Limit Dashboards
Build internal dashboards to visualize rate limit consumption across all your API integrations:
Metrics to track:
- Requests per minute/hour/day
- Percentage of rate limit consumed
- Time until rate limit reset
- Number of 429 errors
- Retry attempt counts
- Circuit breaker state changes
Popular monitoring tools:
- Datadog: Built-in API monitoring with rate limit tracking
- Grafana: Custom dashboards for rate limit metrics
- New Relic: APM with API performance monitoring
- API Status Check: Multi-API monitoring with instant alerts
Frequently Asked Questions
What's the difference between rate limiting and throttling?
Rate limiting enforces hard limits—once you hit the cap, requests are rejected with 429 errors until the limit resets. Throttling slows down requests gradually as you approach limits, introducing artificial delays but still processing them. Rate limiting is binary (allowed/denied), while throttling is progressive (fast/slow/slower). Most APIs use rate limiting, though some (like AWS) implement both.
How do I request a rate limit increase?
Most API providers allow enterprise customers to request higher rate limits:
- Document your use case: Explain why you need higher limits with specific metrics (current usage, projected growth)
- Contact support or sales: Free tier users may need to upgrade first
- Propose optimization: Show you've already optimized (caching, batch requests)
- Negotiate pricing: Higher limits often come with higher costs
- Start small: Request a modest increase (2x-5x) rather than 100x
Providers are more likely to approve increases for customers with payment history and legitimate business use cases.
Should I implement rate limiting on my own API?
Yes, absolutely. Even internal APIs benefit from rate limiting to prevent abuse, bugs, and resource exhaustion. Implement rate limiting when:
- Your API is publicly accessible or serves multiple clients
- Backend resources are expensive (database queries, third-party API calls)
- You need to enforce fair usage across customers or teams
- Security is a concern (preventing brute force attacks)
Use established libraries (express-rate-limit, Django Ratelimit, Kong) rather than building from scratch.
What happens to webhooks during rate limits?
Webhook delivery usually has separate rate limits from API calls. During rate limit issues:
If your webhook endpoint is rate limited: The API provider will retry delivery using exponential backoff. Eventually retries exhaust and events are lost (check retention policies).
If the API provider is rate limited: Webhook delivery may be delayed but typically continues. Providers prioritize webhook delivery over synchronous API responses.
Best practice: Implement idempotent webhook processing and track processed event IDs to handle duplicate deliveries gracefully.
Can I be banned for hitting rate limits too often?
Yes, potentially. While occasional 429 errors are expected, persistent or aggressive violations may result in:
- Temporary IP bans (minutes to hours)
- API key suspension (requires support contact to restore)
- Account review or termination (for clear abuse)
Implement proper backoff logic and respect Retry-After headers. If you're consistently hitting limits, optimize your integration or upgrade your plan—don't try to circumvent limits.
How do concurrent request limits differ from rate limits?
Rate limits control requests per time unit (100 req/min). Concurrent limits control simultaneous in-flight requests (10 concurrent connections). You can hit concurrent limits even with low request rates if requests take a long time to complete. Solutions:
- Connection pooling (reuse connections instead of creating new ones)
- Request queuing (wait for slots before making new requests)
- Async processing (don't block while waiting for responses)
AWS Lambda, for example, has both invocation rate limits AND concurrent execution limits.
What's the best rate limiting algorithm to implement?
For most applications: Token bucket provides the best balance of burst handling and simplicity. It allows legitimate traffic spikes while preventing sustained abuse.
For strict fairness: Sliding window prevents burst exploitation but requires more memory and computation.
For simplicity: Fixed window is easiest to implement but vulnerable to burst attacks.
For traffic smoothing: Leaky bucket when you need to protect downstream systems from spikes.
Choose based on your specific requirements. If unsure, start with token bucket—it's what AWS, Shopify, and most major platforms use.
How do I test my rate limit handling code?
1. Use test endpoints: Many APIs provide dedicated test endpoints that allow higher request rates or faster limit resets.
2. Mock API responses: Simulate 429 errors in your test suite:
// Jest mock example
global.fetch = jest.fn(() =>
Promise.resolve({
status: 429,
headers: {
get: (name) => name === 'Retry-After' ? '5' : null
}
})
);
3. Use rate limit testing services: Tools like Artillery and k6 can generate controlled load to test rate limit behavior.
4. Implement chaos engineering: Randomly inject 429 responses in staging environments to verify resilience.
5. Monitor production carefully: Use gradual rollouts and robust observability when deploying rate limit handling changes.
Stay Ahead of API Issues
Don't wait for rate limit errors or outages to disrupt your production services. Proactive monitoring helps you catch issues before they impact users.
Monitor your critical APIs with API Status Check:
- Real-time health checks every 60 seconds
- Instant alerts via email, Slack, Discord, or webhook
- Track response times and error rates across 100+ APIs
- Historical uptime data and incident reports
- No configuration required—start monitoring in 30 seconds
Popular APIs to monitor:
- Is Stripe Down? - Payment processing
- Is SendGrid Down? - Email delivery
- Is Twilio Down? - SMS and communications
- Is Slack Down? - Team collaboration
- Is Shopify Down? - E-commerce platform
- Is Supabase Down? - Backend services
- Is Heroku Down? - Cloud hosting
- Is Notion Down? - Productivity workspace
Last updated: February 4, 2026. Rate limit information is based on publicly documented policies and subject to change. Always refer to official API documentation for the most current limits.
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status →