API Rate Limiting: Complete Implementation Guide for Developers
API Rate Limiting: Complete Implementation Guide for Developers
Rate limiting is the unsung hero of API stability. Too lenient, and a single misbehaving client can take down your entire service. Too strict, and legitimate users suffer. Getting it right is critical for both API providers and consumers.
This guide covers everything you need to implement effective rate limiting, from choosing the right algorithm to handling limits gracefully in production.
What is API Rate Limiting?
Rate limiting controls how many requests a client can make to an API within a specific time window. It prevents abuse, ensures fair resource allocation, and protects infrastructure from overload.
Common rate limit patterns:
- Per-user limits: 1,000 requests/hour per API key
- Endpoint-specific limits: 100 writes/min, 10,000 reads/min
- Global limits: 100,000 requests/hour across all users
- Burst allowances: 100 requests in 10 seconds, then throttle
Why Rate Limiting Matters
For API providers:
- Prevents denial-of-service attacks (accidental or malicious)
- Ensures fair resource allocation across clients
- Protects infrastructure costs from runaway scripts
- Enables sustainable free tiers
For API consumers:
- Forces efficient request patterns (batching, caching)
- Predictable costs and performance
- Clear feedback when hitting limits
Real-world impact: In 2023, a misconfigured script at a Fortune 500 company made 47 million requests to Stripe's API in 6 hours. Rate limiting prevented service degradation for other customers and automatically throttled the runaway client.
Rate Limiting Algorithms Explained
1. Fixed Window Counter
How it works: Allows N requests per fixed time window (e.g., 1000 requests per hour starting at each clock hour).
Pros:
- Simple to implement (single counter + reset timestamp)
- Low memory footprint
- Easy to understand and debug
Cons:
- Burst vulnerability: A client can make 1000 requests at 10:59 AM and another 1000 at 11:00 AM (2000 requests in 2 minutes)
- Uneven load distribution
Implementation:
interface RateLimitState {
count: number;
resetTime: number;
}
class FixedWindowRateLimiter {
private limits: Map<string, RateLimitState> = new Map();
constructor(
private maxRequests: number,
private windowMs: number
) {}
async check(clientId: string): Promise<boolean> {
const now = Date.now();
const state = this.limits.get(clientId);
// Reset if window expired
if (!state || now >= state.resetTime) {
this.limits.set(clientId, {
count: 1,
resetTime: now + this.windowMs
});
return true;
}
// Increment and check
if (state.count < this.maxRequests) {
state.count++;
return true;
}
return false; // Rate limit exceeded
}
getRemainingRequests(clientId: string): number {
const state = this.limits.get(clientId);
if (!state || Date.now() >= state.resetTime) {
return this.maxRequests;
}
return Math.max(0, this.maxRequests - state.count);
}
}
// Usage
const limiter = new FixedWindowRateLimiter(1000, 3600000); // 1000 req/hour
async function handleRequest(userId: string) {
if (await limiter.check(userId)) {
// Process request
return { success: true };
} else {
return {
error: "Rate limit exceeded",
remaining: limiter.getRemainingRequests(userId)
};
}
}
2. Sliding Window Log
How it works: Tracks the timestamp of every request. Counts requests within a rolling time window.
Pros:
- Eliminates burst vulnerability (true rolling window)
- Precise enforcement
- Fair distribution of requests
Cons:
- High memory usage (stores all request timestamps)
- Expensive cleanup of old timestamps
- Doesn't scale well with high request volumes
Implementation:
class SlidingWindowLogLimiter {
private logs: Map<string, number[]> = new Map();
constructor(
private maxRequests: number,
private windowMs: number
) {}
async check(clientId: string): Promise<boolean> {
const now = Date.now();
const log = this.logs.get(clientId) || [];
// Remove timestamps outside window
const validTimestamps = log.filter(
timestamp => now - timestamp < this.windowMs
);
if (validTimestamps.length < this.maxRequests) {
validTimestamps.push(now);
this.logs.set(clientId, validTimestamps);
return true;
}
this.logs.set(clientId, validTimestamps);
return false;
}
getResetTime(clientId: string): number {
const log = this.logs.get(clientId) || [];
if (log.length === 0) return Date.now();
return log[0] + this.windowMs;
}
}
3. Sliding Window Counter (Hybrid)
How it works: Combines fixed window and sliding window. Tracks current and previous window counts, interpolates based on time elapsed.
Pros:
- Memory efficient (only 2 counters per client)
- Smooth rate limiting (no burst vulnerability)
- Balances precision and performance
Cons:
- Slightly more complex logic
- Not perfectly precise (but close enough for most use cases)
Implementation:
interface SlidingWindowState {
currentCount: number;
previousCount: number;
resetTime: number;
}
class SlidingWindowCounterLimiter {
private states: Map<string, SlidingWindowState> = new Map();
constructor(
private maxRequests: number,
private windowMs: number
) {}
async check(clientId: string): Promise<boolean> {
const now = Date.now();
let state = this.states.get(clientId);
// Initialize or reset window
if (!state || now >= state.resetTime) {
this.states.set(clientId, {
currentCount: 1,
previousCount: state?.currentCount || 0,
resetTime: now + this.windowMs
});
return true;
}
// Calculate weighted count
const timeElapsedInWindow = now - (state.resetTime - this.windowMs);
const percentageIntoWindow = timeElapsedInWindow / this.windowMs;
const previousWeight = 1 - percentageIntoWindow;
const estimatedCount =
state.currentCount + (state.previousCount * previousWeight);
if (estimatedCount < this.maxRequests) {
state.currentCount++;
return true;
}
return false;
}
}
Why this is the best choice for most APIs: Sliding window counter provides 90%+ accuracy of true sliding window with only 10% the memory overhead. GitHub, Cloudflare, and Stripe all use variations of this algorithm.
4. Token Bucket
How it works: Each client has a bucket that holds tokens. Tokens are added at a constant rate. Each request consumes a token. When the bucket is empty, requests are rejected.
Pros:
- Allows controlled bursts (up to bucket capacity)
- Smooth long-term rate limiting
- Great for APIs with bursty traffic patterns
Cons:
- Requires background process to refill tokens
- Slightly more complex than window counters
Implementation:
interface TokenBucketState {
tokens: number;
lastRefill: number;
}
class TokenBucketLimiter {
private buckets: Map<string, TokenBucketState> = new Map();
constructor(
private maxTokens: number,
private refillRate: number, // tokens per second
private refillIntervalMs: number = 1000
) {}
async check(clientId: string): Promise<boolean> {
const now = Date.now();
let bucket = this.buckets.get(clientId);
// Initialize bucket
if (!bucket) {
bucket = {
tokens: this.maxTokens,
lastRefill: now
};
this.buckets.set(clientId, bucket);
}
// Refill tokens based on time elapsed
const timeSinceRefill = now - bucket.lastRefill;
const tokensToAdd =
(timeSinceRefill / this.refillIntervalMs) * this.refillRate;
bucket.tokens = Math.min(
this.maxTokens,
bucket.tokens + tokensToAdd
);
bucket.lastRefill = now;
// Consume token if available
if (bucket.tokens >= 1) {
bucket.tokens--;
return true;
}
return false;
}
getRemainingTokens(clientId: string): number {
const bucket = this.buckets.get(clientId);
if (!bucket) return this.maxTokens;
const now = Date.now();
const timeSinceRefill = now - bucket.lastRefill;
const tokensToAdd =
(timeSinceRefill / this.refillIntervalMs) * this.refillRate;
return Math.min(this.maxTokens, bucket.tokens + tokensToAdd);
}
}
// Usage
const limiter = new TokenBucketLimiter(
100, // 100 token bucket capacity (allows bursts)
10, // Refill 10 tokens per second
1000 // Refill interval
);
When to use token bucket: Best for APIs where legitimate use cases involve bursts (image processing, batch operations, webhook deliveries). AWS, Twilio, and SendGrid use token bucket algorithms.
5. Leaky Bucket
How it works: Requests enter a queue (bucket). The bucket processes requests at a constant rate (leaks). When the queue is full, new requests are rejected.
Pros:
- Guarantees smooth, consistent request rate
- Prevents sudden traffic spikes
- Great for protecting downstream services
Cons:
- Adds latency (requests wait in queue)
- Complexity in managing queue state
- Can delay legitimate urgent requests
Implementation:
interface QueuedRequest {
timestamp: number;
resolve: () => void;
reject: () => void;
}
class LeakyBucketLimiter {
private queues: Map<string, QueuedRequest[]> = new Map();
constructor(
private maxQueueSize: number,
private processRateMs: number // time between processing requests
) {
// Start background processor
setInterval(() => this.processQueues(), this.processRateMs);
}
async check(clientId: string): Promise<boolean> {
return new Promise((resolve, reject) => {
let queue = this.queues.get(clientId) || [];
if (queue.length >= this.maxQueueSize) {
resolve(false); // Queue full, reject immediately
return;
}
queue.push({
timestamp: Date.now(),
resolve: () => resolve(true),
reject: () => resolve(false)
});
this.queues.set(clientId, queue);
});
}
private processQueues() {
for (const [clientId, queue] of this.queues) {
if (queue.length > 0) {
const request = queue.shift()!;
request.resolve(); // Process one request per interval
}
}
}
}
When to use leaky bucket: Best when you need guaranteed smooth traffic to downstream services (database writes, external API calls). Less common for public APIs due to latency concerns.
Rate Limiting HTTP Headers (Industry Standard)
Your API should always return these headers:
X-RateLimit-Limit: 1000 # Max requests per window
X-RateLimit-Remaining: 847 # Requests left in current window
X-RateLimit-Reset: 1678901234 # Unix timestamp when limit resets
Retry-After: 3600 # Seconds until retry (when rate limited)
When rate limit is exceeded, return:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1678901234
Retry-After: 3600
{
"error": "rate_limit_exceeded",
"message": "You've exceeded the rate limit of 1000 requests per hour",
"reset_at": "2026-03-09T05:05:00Z",
"documentation_url": "https://apistatuscheck.com/docs/rate-limiting"
}
Client-Side Rate Limit Handling
1. Exponential Backoff with Jitter
async function fetchWithRateLimit(
url: string,
options: RequestInit = {},
maxRetries: number = 5
): Promise<Response> {
let retries = 0;
while (retries < maxRetries) {
const response = await fetch(url, options);
// Success
if (response.ok) {
return response;
}
// Rate limited
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const resetTime = response.headers.get('X-RateLimit-Reset');
let waitMs: number;
if (retryAfter) {
waitMs = parseInt(retryAfter) * 1000;
} else if (resetTime) {
waitMs = parseInt(resetTime) * 1000 - Date.now();
} else {
// Exponential backoff with jitter
waitMs = Math.min(
1000 * Math.pow(2, retries) + Math.random() * 1000,
60000 // Max 60 seconds
);
}
console.log(`Rate limited. Retrying in ${waitMs}ms...`);
await new Promise(resolve => setTimeout(resolve, waitMs));
retries++;
continue;
}
// Other error
throw new Error(`Request failed: ${response.statusText}`);
}
throw new Error('Max retries exceeded');
}
// Usage
try {
const response = await fetchWithRateLimit('https://api.example.com/data');
const data = await response.json();
} catch (error) {
console.error('Request failed after retries:', error);
}
2. Client-Side Rate Limiter (Proactive)
Instead of waiting for 429 errors, proactively throttle requests:
class ClientSideRateLimiter {
private queue: Array<() => Promise<any>> = [];
private processing = false;
constructor(
private requestsPerSecond: number
) {}
async enqueue<T>(fn: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
this.queue.push(async () => {
try {
const result = await fn();
resolve(result);
} catch (error) {
reject(error);
}
});
this.processQueue();
});
}
private async processQueue() {
if (this.processing || this.queue.length === 0) return;
this.processing = true;
const intervalMs = 1000 / this.requestsPerSecond;
while (this.queue.length > 0) {
const fn = this.queue.shift()!;
await fn();
await new Promise(resolve => setTimeout(resolve, intervalMs));
}
this.processing = false;
}
}
// Usage
const limiter = new ClientSideRateLimiter(10); // 10 requests/second
// All requests are automatically throttled
for (let i = 0; i < 100; i++) {
limiter.enqueue(() =>
fetch(`https://api.example.com/item/${i}`)
).then(response => response.json())
.then(data => console.log(data));
}
Server-Side Implementation Patterns
Express.js Middleware
import express from 'express';
import { SlidingWindowCounterLimiter } from './rate-limiters';
const app = express();
const limiter = new SlidingWindowCounterLimiter(
100, // 100 requests
60000 // per minute
);
// Apply to all routes
app.use(async (req, res, next) => {
const clientId = req.ip || req.headers['x-forwarded-for'] as string;
const allowed = await limiter.check(clientId);
if (!allowed) {
res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Too many requests, please try again later'
});
return;
}
next();
});
// Apply different limits to specific endpoints
const strictLimiter = new SlidingWindowCounterLimiter(10, 60000);
app.post('/api/expensive-operation', async (req, res, next) => {
const clientId = req.headers['x-api-key'] as string;
if (!await strictLimiter.check(clientId)) {
res.status(429).json({
error: 'rate_limit_exceeded',
message: 'This endpoint allows 10 requests per minute'
});
return;
}
next();
});
Redis-Based Distributed Rate Limiting
For multi-server deployments:
import Redis from 'ioredis';
class RedisRateLimiter {
private redis: Redis;
constructor(
redisUrl: string,
private maxRequests: number,
private windowSeconds: number
) {
this.redis = new Redis(redisUrl);
}
async check(clientId: string): Promise<{
allowed: boolean;
remaining: number;
resetAt: number;
}> {
const key = `ratelimit:${clientId}`;
const now = Date.now();
const windowStart = now - (this.windowSeconds * 1000);
// Use Redis sorted set to track requests in time window
const multi = this.redis.multi();
// Remove old requests outside window
multi.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
multi.zcard(key);
// Add current request
multi.zadd(key, now, `${now}-${Math.random()}`);
// Set expiry
multi.expire(key, this.windowSeconds);
const results = await multi.exec();
const count = results![1][1] as number;
const allowed = count < this.maxRequests;
const remaining = Math.max(0, this.maxRequests - count - 1);
const resetAt = now + (this.windowSeconds * 1000);
return { allowed, remaining, resetAt };
}
}
// Usage in Express
const limiter = new RedisRateLimiter(
'redis://localhost:6379',
1000, // requests
3600 // per hour
);
app.use(async (req, res, next) => {
const apiKey = req.headers['x-api-key'] as string;
const result = await limiter.check(apiKey);
res.setHeader('X-RateLimit-Limit', '1000');
res.setHeader('X-RateLimit-Remaining', result.remaining.toString());
res.setHeader('X-RateLimit-Reset', Math.floor(result.resetAt / 1000).toString());
if (!result.allowed) {
res.status(429).json({ error: 'rate_limit_exceeded' });
return;
}
next();
});
Rate Limiting Best Practices
1. Different Limits for Different Tiers
const RATE_LIMITS = {
free: { requests: 100, windowMs: 3600000 }, // 100/hour
basic: { requests: 1000, windowMs: 3600000 }, // 1000/hour
pro: { requests: 10000, windowMs: 3600000 }, // 10000/hour
enterprise: { requests: 100000, windowMs: 3600000 } // 100k/hour
};
async function getRateLimitForUser(apiKey: string) {
const user = await getUserByApiKey(apiKey);
return RATE_LIMITS[user.tier];
}
2. Endpoint-Specific Limits
const ENDPOINT_LIMITS = {
'GET /api/users': { requests: 1000, windowMs: 60000 }, // 1000/min
'POST /api/users': { requests: 100, windowMs: 60000 }, // 100/min
'DELETE /api/users': { requests: 50, windowMs: 60000 }, // 50/min
'POST /api/webhooks': { requests: 10, windowMs: 60000 } // 10/min
};
3. Whitelist Critical Clients
const WHITELISTED_KEYS = new Set([
'internal-service-key-1',
'monitoring-service-key',
'critical-partner-key'
]);
app.use(async (req, res, next) => {
const apiKey = req.headers['x-api-key'] as string;
// Skip rate limiting for whitelisted keys
if (WHITELISTED_KEYS.has(apiKey)) {
next();
return;
}
// Apply rate limiting for everyone else
// ...
});
4. Monitor Rate Limit Metrics
Track these metrics to tune your limits:
interface RateLimitMetrics {
totalRequests: number;
rateLimitedRequests: number;
uniqueClients: number;
p50ResponseTime: number;
p99ResponseTime: number;
topConsumers: Array<{ clientId: string; requests: number }>;
}
class RateLimitMonitor {
private metrics: RateLimitMetrics = {
totalRequests: 0,
rateLimitedRequests: 0,
uniqueClients: 0,
p50ResponseTime: 0,
p99ResponseTime: 0,
topConsumers: []
};
recordRequest(clientId: string, wasRateLimited: boolean, responseTimeMs: number) {
this.metrics.totalRequests++;
if (wasRateLimited) {
this.metrics.rateLimitedRequests++;
}
// Send to metrics system (DataDog, Prometheus, etc.)
this.sendToMetrics({
metric: 'api.rate_limit.requests',
value: 1,
tags: [
`client:${clientId}`,
`rate_limited:${wasRateLimited}`,
`response_time:${responseTimeMs}`
]
});
}
getMetrics(): RateLimitMetrics {
return this.metrics;
}
private sendToMetrics(data: any) {
// Implement sending to your metrics backend
}
}
5. Graceful Degradation
Instead of hard rejecting, consider soft limits:
class TieredRateLimiter {
constructor(
private softLimit: number,
private hardLimit: number,
private windowMs: number
) {}
async check(clientId: string): Promise<{
allowed: boolean;
throttled: boolean;
delay: number;
}> {
const count = await this.getRequestCount(clientId);
// Under soft limit: full speed
if (count < this.softLimit) {
return { allowed: true, throttled: false, delay: 0 };
}
// Between soft and hard limit: throttled
if (count < this.hardLimit) {
const throttleDelay = 1000; // 1 second delay
return { allowed: true, throttled: true, delay: throttleDelay };
}
// Over hard limit: rejected
return { allowed: false, throttled: false, delay: 0 };
}
}
Real-World Rate Limiting Examples
GitHub API
- 5,000 requests/hour for authenticated requests
- 60 requests/hour for unauthenticated requests
- Separate limits for GraphQL API
- Different limits for GitHub Apps
Stripe API
- No hard rate limit published
- Uses adaptive rate limiting based on account history
- Returns 429 when limits exceeded
- Recommends exponential backoff
Twitter API (X)
- 15 requests per 15-minute window (free tier)
- 50 requests per 15-minute window (basic tier)
- Separate limits per endpoint
- App-based and user-based limits
OpenAI API
- Requests per minute (RPM) and tokens per minute (TPM) limits
- Different limits per model tier
- GPT-4: 500 RPM, 10,000 TPM (tier 1)
- GPT-3.5: 3,500 RPM, 90,000 TPM (tier 1)
Testing Rate Limits
Load Testing Script
async function testRateLimit(
url: string,
requestsPerSecond: number,
durationSeconds: number
) {
const results = {
total: 0,
successful: 0,
rateLimited: 0,
errors: 0
};
const startTime = Date.now();
const endTime = startTime + (durationSeconds * 1000);
const intervalMs = 1000 / requestsPerSecond;
while (Date.now() < endTime) {
results.total++;
try {
const response = await fetch(url);
if (response.ok) {
results.successful++;
} else if (response.status === 429) {
results.rateLimited++;
} else {
results.errors++;
}
} catch (error) {
results.errors++;
}
await new Promise(resolve => setTimeout(resolve, intervalMs));
}
console.log('Rate Limit Test Results:');
console.log(`Total requests: ${results.total}`);
console.log(`Successful: ${results.successful} (${(results.successful / results.total * 100).toFixed(1)}%)`);
console.log(`Rate limited: ${results.rateLimited} (${(results.rateLimited / results.total * 100).toFixed(1)}%)`);
console.log(`Errors: ${results.errors} (${(results.errors / results.total * 100).toFixed(1)}%)`);
return results;
}
// Test with gradually increasing load
async function loadTest() {
console.log('Testing at 10 req/s...');
await testRateLimit('https://api.example.com/endpoint', 10, 60);
console.log('\nTesting at 50 req/s...');
await testRateLimit('https://api.example.com/endpoint', 50, 60);
console.log('\nTesting at 100 req/s...');
await testRateLimit('https://api.example.com/endpoint', 100, 60);
}
Monitoring Rate Limit Health
Build a dashboard tracking:
Rate limit hit rate — What % of requests are being rate limited?
- Target: <1% for production APIs
5% suggests limits too strict or clients misbehaving
Top consumers — Which clients hit limits most often?
- Helps identify misconfigured integrations
- Candidates for dedicated limits or throttling
Limit utilization — How close are users to their limits?
- If 90% of users stay under 20% of limit, limits might be too generous
- If 50% of users hit limits regularly, limits might be too strict
Response time correlation — Does rate limiting affect latency?
- Measure P50/P95/P99 response times for rate-limited vs. allowed requests
Common Rate Limiting Mistakes
❌ Mistake 1: Not Communicating Limits Clearly
Bad:
{ "error": "Too many requests" }
Good:
{
"error": "rate_limit_exceeded",
"message": "You've made 1,247 requests in the last hour. Your limit is 1,000 requests per hour.",
"limit": 1000,
"window": "1 hour",
"reset_at": "2026-03-09T05:05:00Z",
"documentation": "https://docs.example.com/rate-limits"
}
❌ Mistake 2: Global Limits Only
Having a single global limit (e.g., 10,000 req/hour) means a single misbehaving endpoint can exhaust the entire quota. Use per-endpoint limits.
❌ Mistake 3: Not Handling Distributed Systems
If you have multiple API servers, each tracking limits independently, users get N × your intended limit. Use Redis or a distributed rate limiter.
❌ Mistake 4: Resetting at Clock Hour
Fixed windows that reset at predictable times (midnight, top of the hour) create traffic spikes. Use sliding windows or reset based on first request time.
❌ Mistake 5: No Burst Allowance
Strict per-second rate limiting (10 req/s) prevents legitimate bursts. Token bucket allows (e.g.) 100 req burst, then 10 req/s sustained.
Rate Limiting for Different API Types
REST APIs
- Use HTTP 429 status code
- Return standard headers (X-RateLimit-*)
- Endpoint-specific limits (GET vs POST)
GraphQL APIs
- Complexity-based limits (query depth, field count)
- Cost-based limits (expensive resolvers cost more)
- Can't rely on HTTP method (always POST)
WebSocket APIs
- Message-based limits (N messages per minute)
- Connection-based limits (max concurrent connections)
- Bandwidth limits (bytes per second)
Webhook Deliveries
- Retry limits (max 3 retries with backoff)
- Delivery rate limits (max 100 webhooks/min per endpoint)
- Failure thresholds (pause after 10 consecutive failures)
Conclusion
Rate limiting is essential for API stability, but there's no one-size-fits-all solution. Choose your algorithm based on your use case:
- Fixed window — Simple, low-memory, acceptable for internal APIs
- Sliding window counter — Best balance for most public APIs
- Token bucket — Great for APIs with legitimate bursty traffic
- Leaky bucket — When you need guaranteed smooth downstream load
Always communicate limits clearly, provide generous Retry-After headers, and monitor rate limit metrics to tune your thresholds.
Related resources:
- API Dependency Monitoring Guide
- Understanding API SLAs
- API Error Codes Explained
- Monitor your critical API dependencies →
Need to monitor API rate limit consumption across your dependencies? APIStatusCheck tracks status and performance for 200+ popular APIs.
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
Free dashboard available · 14-day trial on paid plans · Cancel anytime
Browse Free Dashboard →