API Rate Limiting Cheat Sheet: Headers, Patterns & Best Practices

API Rate Limiting Cheat Sheet: Headers, Patterns & Best Practices

Quick Answer: API rate limiting controls how many requests a client can make to an API within a specific time window (e.g., 100 requests per minute). Essential response headers include X-RateLimit-Limit (max requests), X-RateLimit-Remaining (requests left), and Retry-After (seconds until reset). Implement exponential backoff when hitting 429 errors and monitor rate limit consumption to avoid service disruption.

Rate limiting is the invisible traffic cop of the API world—when implemented correctly, you'll never notice it's there. When misunderstood or ignored, it can bring your application to a grinding halt at the worst possible moment. Whether you're integrating with Stripe's payment APIs, GitHub's webhooks, or OpenAI's GPT models, understanding rate limits is non-negotiable for production applications.

This comprehensive reference guide covers everything developers need to know about API rate limiting: how it works, common implementation patterns, rate limits for major APIs, code examples for handling limits gracefully, and monitoring strategies to prevent surprises.

What is API Rate Limiting?

API rate limiting is a technique used by API providers to control the number of requests a client (user, application, or IP address) can make within a specified time window. It serves multiple critical purposes:

Performance Protection: Prevents any single client from monopolizing server resources and degrading performance for other users. Without rate limits, a buggy integration making thousands of requests per second could bring down an entire service.

Cost Control: API calls consume computational resources (database queries, processing time, bandwidth). Rate limiting helps providers manage infrastructure costs and prevent abuse that could result in unexpected billing spikes for serverless architectures.

Security Defense: Acts as a first line of defense against denial-of-service (DoS) attacks, credential stuffing attempts, and malicious bots attempting to scrape data or exploit vulnerabilities.

Fair Usage Enforcement: Ensures equitable access to shared resources across all customers. Enterprise plans typically get higher limits than free tiers, creating a natural upgrade path.

Data Integrity: For APIs that modify data (POST, PUT, DELETE), rate limiting prevents accidental duplicate operations caused by retry storms or misconfigured automation.

Common Rate Limit HTTP Headers

When you make an API request, most providers include standardized headers in the response that tell you about your current rate limit status. Understanding these headers is crucial for implementing proactive rate limit handling.

Standard Rate Limit Headers

Header Name Description Example Value
X-RateLimit-Limit Maximum requests allowed in the current window 5000
X-RateLimit-Remaining Number of requests remaining in current window 4273
X-RateLimit-Reset Unix timestamp when the limit resets 1738627200
X-RateLimit-Used Number of requests consumed in current window 727
Retry-After Seconds to wait before retrying (sent with 429 errors) 45
RateLimit-Policy Describes the rate limit policy in structured format 100;w=60

Parsing Rate Limit Headers in Code

JavaScript/Node.js:

async function makeAPIRequest(url, options) {
  const response = await fetch(url, options);
  
  const limit = parseInt(response.headers.get('X-RateLimit-Limit'));
  const remaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
  const reset = parseInt(response.headers.get('X-RateLimit-Reset'));
  
  console.log(`Rate limit: ${remaining}/${limit} remaining, resets at ${new Date(reset * 1000)}`);
  
  // Proactive throttling when approaching limit
  if (remaining < limit * 0.1) {
    console.warn('⚠️  Approaching rate limit, implementing backoff');
    await sleep(1000);
  }
  
  if (response.status === 429) {
    const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
    throw new RateLimitError(`Rate limited. Retry after ${retryAfter}s`, retryAfter);
  }
  
  return response.json();
}

Python:

import requests
import time
from datetime import datetime

def make_api_request(url, headers=None):
    response = requests.get(url, headers=headers)
    
    # Extract rate limit info
    limit = int(response.headers.get('X-RateLimit-Limit', 0))
    remaining = int(response.headers.get('X-RateLimit-Remaining', 0))
    reset = int(response.headers.get('X-RateLimit-Reset', 0))
    
    reset_time = datetime.fromtimestamp(reset)
    print(f"Rate limit: {remaining}/{limit} remaining, resets at {reset_time}")
    
    # Proactive throttling
    if remaining < limit * 0.1:
        print("⚠️  Approaching rate limit, slowing down")
        time.sleep(1)
    
    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 60))
        raise RateLimitException(f"Rate limited. Retry after {retry_after}s", retry_after)
    
    return response.json()

The 429 Status Code

When you exceed a rate limit, APIs return a 429 Too Many Requests status code. This is your signal to stop making requests and wait. Never ignore 429 responses—continued requests will often result in temporary or permanent IP bans.

Proper 429 handling:

  1. Immediately stop making requests
  2. Check the Retry-After header for wait duration
  3. Implement exponential backoff if Retry-After is not provided
  4. Log the incident for monitoring and analysis
  5. Alert your team if 429s become frequent

Rate Limits for Popular APIs

Understanding the specific rate limits for APIs you depend on is critical for capacity planning and avoiding production incidents. Here's a comprehensive reference for major API providers:

Payment & Financial APIs

API Free/Basic Tier Standard Tier Enterprise
Stripe 100 req/sec (test)
25 req/sec (live)
100 req/sec (live) Custom (1000+ req/sec)
PayPal 50 req/sec 100 req/sec Custom
Plaid 10 req/sec
1000 req/hour
50 req/sec Custom
Square 10 req/sec 40 req/sec Custom

Stripe specifics:

  • Rate limits apply per API key (test vs live keys have separate limits)
  • Burst allowance: short bursts up to 200 req/sec tolerated
  • Search API: Lower limits (20 req/sec)
  • Connected accounts: Separate limit pools

Note: Always check Is Stripe Down if you're experiencing consistent API errors beyond rate limiting.

Communication & Messaging APIs

API Free/Basic Tier Standard Tier Notes
Twilio 1 msg/sec (trial) 100 msg/sec Per account SID
SendGrid 100 emails/day (free) 100 req/sec Based on plan tier
Slack Tier 1: 1 req/min
Tier 2: 20 req/min
Tier 3: 50 req/min
Tier 4: 100 req/min Method-specific tiers
Discord 50 req/sec Same for all Global and per-route limits

SendGrid specifics:

  • API calls and email sends have separate limits
  • Marketing campaigns: 2,000 req/hour
  • Email validation: 500 req/hour

Slack specifics:

  • Different methods in different tiers
  • chat.postMessage: Tier 3 (50/min)
  • users.list: Tier 2 (20/min)
  • Workspace token limits apply to entire workspace

Check Is SendGrid Down or Is Slack Down for real-time status.

Developer Platform APIs

API Authenticated Unauthenticated Special Limits
GitHub 5,000 req/hour 60 req/hour Search: 30 req/min
GraphQL: 5,000 points/hour
GitLab 2,000 req/min 10 req/min Depends on plan tier
Bitbucket 1,000 req/hour 60 req/hour Per OAuth consumer

GitHub specifics:

  • GraphQL API uses point system (each field costs points)
  • Secondary rate limits for content creation (80 POST/PUT/DELETE per minute)
  • Conditional requests (304 Not Modified) don't count against limit
  • Enterprise Cloud: Higher limits available

AI & Machine Learning APIs

API Model Requests Per Minute (RPM) Tokens Per Minute (TPM)
OpenAI GPT-4 (Free tier) 3 RPM 40,000 TPM
OpenAI GPT-4 (Tier 1) 500 RPM 30,000 TPM
OpenAI GPT-4 (Tier 5) 10,000 RPM 300,000,000 TPM
OpenAI GPT-3.5-Turbo (Tier 1) 3,500 RPM 60,000 TPM
Anthropic Claude (Free) 5 RPM 40,000 TPM
Anthropic Claude (Pro) 1,000 RPM Varies by model

OpenAI specifics:

  • Tiered limits based on usage history and payment
  • Both RPM (requests) and TPM (tokens) limits apply
  • Batch API: Higher throughput, lower priority
  • Different limits for different models
  • Image generation: Separate limits (50 images/min for DALL-E 3)

E-commerce & Marketplace APIs

API Standard Limit Notes
Shopify 2 req/sec (REST)
1000 cost points/sec (GraphQL)
Leaky bucket algorithm
WooCommerce No official limit Self-hosted, server-dependent
Amazon SP-API Varies by endpoint 1-200 req/sec depending on operation
eBay 5,000 req/day (free) Varies by API and tier

Shopify specifics:

  • Shopify Plus: 4 req/sec
  • GraphQL uses cost calculation (each query has points)
  • Bulk operations: Separate limits
  • REST Admin API: 2 calls/sec sustained, bursts allowed

Monitor Is Shopify Down for platform-wide issues beyond rate limiting.

Cloud Infrastructure APIs

Provider Service Rate Limit
AWS API Gateway 10,000 req/sec (default)
AWS Lambda 1,000 concurrent executions
AWS DynamoDB 40,000 RCU / 40,000 WCU per table
Google Cloud Cloud Functions 1,000 req/sec per function
Google Cloud Firestore 10,000 writes/sec per database
Azure Functions 200 concurrent instances (Consumption)

Important: Cloud provider limits are often per-region and per-service. Always check specific service documentation and request limit increases through support tickets if needed.

Rate Limiting Implementation Patterns

API providers use various algorithms to implement rate limiting, each with different characteristics and use cases. Understanding these patterns helps you predict behavior and optimize your integration strategy.

1. Fixed Window

How it works: Divides time into fixed intervals (e.g., 1-minute windows). You get a fixed quota at the start of each window.

Example: 100 requests per minute, window resets at :00 seconds

Minute 1 (00:00-00:59): 100 requests available
Minute 2 (01:00-01:59): 100 requests available (resets at 01:00)

Pros:

  • Simple to implement and understand
  • Predictable reset times
  • Low memory footprint

Cons:

  • Burst vulnerability: User can make 200 requests in 2 seconds (100 at 00:59, 100 at 01:00)
  • Cliff effect: Quota exhausted users must wait until window reset
  • Uneven traffic distribution

Used by: GitHub (hourly window), many simple APIs

2. Sliding Window

How it works: Considers requests made in the past N time units from the current moment, providing smoother rate limiting.

Example: 100 requests per 60-second sliding window

At 12:30:45, checks all requests since 12:29:45
At 12:30:46, checks all requests since 12:29:46

Pros:

  • Prevents burst exploitation
  • Smoother rate limiting experience
  • More accurate representation of "requests per time unit"

Cons:

  • More complex implementation
  • Higher memory usage (must track timestamps)
  • Computationally more expensive

Used by: Stripe, Redis-based rate limiters

Implementation (Redis + Node.js):

const Redis = require('ioredis');
const redis = new Redis();

async function checkRateLimit(userId, limit = 100, windowSec = 60) {
  const now = Date.now();
  const windowStart = now - (windowSec * 1000);
  
  const key = `ratelimit:${userId}`;
  
  // Add current request
  await redis.zadd(key, now, `${now}`);
  
  // Remove old requests outside window
  await redis.zremrangebyscore(key, '-inf', windowStart);
  
  // Count requests in window
  const count = await redis.zcard(key);
  
  // Set expiration
  await redis.expire(key, windowSec);
  
  return {
    allowed: count <= limit,
    remaining: Math.max(0, limit - count),
    resetAt: now + (windowSec * 1000)
  };
}

3. Token Bucket

How it works: A bucket holds tokens (representing requests). Tokens are added at a fixed rate. Each request consumes one token. If bucket is empty, request is denied.

Example: Bucket capacity: 100 tokens, refill rate: 10 tokens/second

Initial: 100 tokens available
Make 20 requests: 80 tokens remain
Wait 5 seconds: 80 + (5 * 10) = 100 tokens (capped at bucket size)

Pros:

  • Allows controlled bursts (up to bucket capacity)
  • Smooth refill behavior
  • Works well for varying traffic patterns
  • Easy to reason about

Cons:

  • Requires tracking state (bucket level, last refill time)
  • Can be exploited with careful timing
  • Bucket size tuning requires experimentation

Used by: AWS API Gateway, Shopify (leaky bucket variant), many enterprise APIs

Implementation (Python):

import time
import threading

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.time()
        self.lock = threading.Lock()
    
    def consume(self, tokens=1):
        with self.lock:
            self._refill()
            
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            return False
    
    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        tokens_to_add = elapsed * self.refill_rate
        
        self.tokens = min(self.capacity, self.tokens + tokens_to_add)
        self.last_refill = now
    
    def get_status(self):
        with self.lock:
            self._refill()
            return {
                'tokens': self.tokens,
                'capacity': self.capacity,
                'remaining_percent': (self.tokens / self.capacity) * 100
            }

# Usage
bucket = TokenBucket(capacity=100, refill_rate=10)

if bucket.consume(1):
    print("Request allowed")
else:
    print("Rate limited, please wait")
    
status = bucket.get_status()
print(f"Tokens remaining: {status['tokens']}/{status['capacity']}")

4. Leaky Bucket

How it works: Similar to token bucket, but enforces a constant output rate. Requests enter a queue (bucket) and are processed at a fixed rate. If bucket overflows, requests are rejected.

Example: Queue capacity: 100, processing rate: 10 requests/second

Burst of 50 requests arrives: All queued
Processing: 10 requests/sec drain from queue
Another 60 requests arrive: 10 queued, 50 rejected (overflow)

Pros:

  • Smooths traffic spikes
  • Protects downstream systems from bursts
  • Predictable output rate

Cons:

  • Can increase latency (queuing delay)
  • Rejected requests during overflow
  • Complex to implement correctly

Used by: Shopify (GraphQL cost calculation), network traffic shaping

Handling Rate Limits in Your Code

When you inevitably hit rate limits, how you respond determines whether you experience minor delays or complete service disruption. Here are battle-tested patterns for graceful rate limit handling.

Exponential Backoff with Jitter

The gold standard for retry logic. Wait progressively longer between attempts, with randomization to prevent thundering herd problems.

JavaScript/Node.js Implementation:

class RateLimitError extends Error {
  constructor(message, retryAfter) {
    super(message);
    this.retryAfter = retryAfter;
    this.name = 'RateLimitError';
  }
}

async function exponentialBackoff(fn, maxRetries = 5) {
  let retries = 0;
  
  while (retries < maxRetries) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 || error.name === 'RateLimitError') {
        retries++;
        
        if (retries >= maxRetries) {
          throw new Error(`Max retries (${maxRetries}) exceeded`);
        }
        
        // Calculate backoff: min(1000 * 2^retries, 32000) + random jitter
        const baseDelay = Math.min(1000 * Math.pow(2, retries), 32000);
        const jitter = Math.random() * 1000;
        const delay = baseDelay + jitter;
        
        console.log(`Rate limited. Retry ${retries}/${maxRetries} after ${delay.toFixed(0)}ms`);
        
        // Respect Retry-After header if provided
        const retryAfter = error.retryAfter ? error.retryAfter * 1000 : delay;
        
        await sleep(retryAfter);
      } else {
        // Non-rate-limit error, throw immediately
        throw error;
      }
    }
  }
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

// Usage
async function fetchUserData(userId) {
  return exponentialBackoff(async () => {
    const response = await fetch(`https://api.example.com/users/${userId}`);
    
    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '0');
      throw new RateLimitError('Rate limit exceeded', retryAfter);
    }
    
    if (!response.ok) {
      throw new Error(`API error: ${response.status}`);
    }
    
    return response.json();
  });
}

Python Implementation with Decorators:

import time
import random
from functools import wraps

class RateLimitException(Exception):
    def __init__(self, message, retry_after=None):
        super().__init__(message)
        self.retry_after = retry_after

def exponential_backoff(max_retries=5, base_delay=1.0):
    """Decorator for exponential backoff retry logic"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            retries = 0
            
            while retries < max_retries:
                try:
                    return func(*args, **kwargs)
                except RateLimitException as e:
                    retries += 1
                    
                    if retries >= max_retries:
                        raise Exception(f"Max retries ({max_retries}) exceeded")
                    
                    # Calculate backoff with jitter
                    backoff = min(base_delay * (2 ** retries), 32)
                    jitter = random.uniform(0, 1)
                    delay = backoff + jitter
                    
                    # Use Retry-After if provided
                    if e.retry_after:
                        delay = e.retry_after
                    
                    print(f"Rate limited. Retry {retries}/{max_retries} after {delay:.2f}s")
                    time.sleep(delay)
            
            raise Exception("Retry loop completed without success")
        
        return wrapper
    return decorator

# Usage
@exponential_backoff(max_retries=5)
def fetch_user_data(user_id):
    import requests
    
    response = requests.get(f'https://api.example.com/users/{user_id}')
    
    if response.status_code == 429:
        retry_after = int(response.headers.get('Retry-After', 0))
        raise RateLimitException('Rate limit exceeded', retry_after)
    
    response.raise_for_status()
    return response.json()

# Make the call
try:
    user = fetch_user_data(12345)
    print(f"User: {user}")
except Exception as e:
    print(f"Failed to fetch user: {e}")

Request Queuing and Rate Smoothing

For applications making many API calls, implement a queue that smooths requests to stay within limits proactively.

JavaScript Queue Manager:

class RateLimitedQueue {
  constructor(requestsPerSecond) {
    this.queue = [];
    this.processing = false;
    this.requestsPerSecond = requestsPerSecond;
    this.delayBetweenRequests = 1000 / requestsPerSecond;
  }
  
  async enqueue(fn) {
    return new Promise((resolve, reject) => {
      this.queue.push({ fn, resolve, reject });
      this.processQueue();
    });
  }
  
  async processQueue() {
    if (this.processing || this.queue.length === 0) {
      return;
    }
    
    this.processing = true;
    
    while (this.queue.length > 0) {
      const { fn, resolve, reject } = this.queue.shift();
      const startTime = Date.now();
      
      try {
        const result = await fn();
        resolve(result);
      } catch (error) {
        reject(error);
      }
      
      // Enforce rate limit delay
      const elapsed = Date.now() - startTime;
      const remainingDelay = Math.max(0, this.delayBetweenRequests - elapsed);
      
      if (this.queue.length > 0 && remainingDelay > 0) {
        await new Promise(r => setTimeout(r, remainingDelay));
      }
    }
    
    this.processing = false;
  }
  
  getQueueSize() {
    return this.queue.length;
  }
}

// Usage
const apiQueue = new RateLimitedQueue(10); // 10 requests per second

// Make 100 requests that will be automatically rate-limited
const promises = [];
for (let i = 0; i < 100; i++) {
  const promise = apiQueue.enqueue(async () => {
    const response = await fetch(`https://api.example.com/items/${i}`);
    return response.json();
  });
  promises.push(promise);
}

console.log(`Queue size: ${apiQueue.getQueueSize()}`);
const results = await Promise.all(promises);
console.log(`Completed ${results.length} requests without hitting rate limits`);

Circuit Breaker Pattern

When an API consistently returns 429 errors, implement a circuit breaker to avoid wasting resources on requests that will fail.

class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failureThreshold = threshold;
    this.timeout = timeout;
    this.failureCount = 0;
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.nextAttempt = Date.now();
  }
  
  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN. Service temporarily unavailable.');
      }
      this.state = 'HALF_OPEN';
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }
  
  onFailure() {
    this.failureCount++;
    
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.timeout;
      console.error(`Circuit breaker opened. Will retry after ${this.timeout}ms`);
    }
  }
  
  getState() {
    return {
      state: this.state,
      failures: this.failureCount,
      nextAttempt: new Date(this.nextAttempt)
    };
  }
}

// Usage
const breaker = new CircuitBreaker(5, 60000);

async function makeAPICall() {
  return breaker.execute(async () => {
    const response = await fetch('https://api.example.com/data');
    
    if (response.status === 429) {
      throw new Error('Rate limited');
    }
    
    return response.json();
  });
}

Monitoring Rate Limits with API Status Check

Proactive monitoring prevents rate limit surprises that can disrupt your production services. While implementing proper retry logic is essential, knowing before you hit limits allows you to scale gracefully.

Why Monitor Rate Limit Consumption?

Capacity Planning: Understanding your rate limit usage patterns helps you predict when you'll need to upgrade to a higher tier or implement request optimization.

Early Warning: Getting alerts when you reach 80% of your rate limit gives you time to investigate and optimize before hitting hard limits.

Incident Response: When troubleshooting API errors, knowing whether rate limits are involved saves hours of debugging time. Check status dashboards for Stripe, Twilio, Slack, and other critical APIs.

Cost Optimization: High rate limit consumption might indicate inefficient API usage (unnecessary requests, missing caching, duplicate calls) that's costing you money and performance.

Setting Up Rate Limit Monitoring

1. Log rate limit headers from every request:

const winston = require('winston');

const logger = winston.createLogger({
  transports: [new winston.transports.File({ filename: 'api-metrics.log' })]
});

async function monitoredAPICall(url) {
  const response = await fetch(url);
  
  logger.info('API Request', {
    url,
    status: response.status,
    rateLimit: {
      limit: response.headers.get('X-RateLimit-Limit'),
      remaining: response.headers.get('X-RateLimit-Remaining'),
      reset: response.headers.get('X-RateLimit-Reset'),
      percentUsed: (
        (1 - (response.headers.get('X-RateLimit-Remaining') / 
              response.headers.get('X-RateLimit-Limit'))) * 100
      ).toFixed(2)
    }
  });
  
  return response;
}

2. Track 429 errors with alerting:

const Sentry = require('@sentry/node');

if (response.status === 429) {
  Sentry.captureException(new Error('Rate Limit Exceeded'), {
    extra: {
      api: 'stripe',
      endpoint: url,
      retryAfter: response.headers.get('Retry-After'),
      rateLimitRemaining: response.headers.get('X-RateLimit-Remaining')
    },
    level: 'warning'
  });
}

3. Use API Status Check for multi-API monitoring:

API Status Check monitors response times, error rates, and availability for 100+ popular APIs including:

Get instant alerts when:

  • API response times spike (potential rate limit throttling)
  • Error rates increase (429s or 5xx errors)
  • Complete outages occur (not just rate limits)

Start monitoring your critical APIs →

Rate Limit Dashboards

Build internal dashboards to visualize rate limit consumption across all your API integrations:

Metrics to track:

  • Requests per minute/hour/day
  • Percentage of rate limit consumed
  • Time until rate limit reset
  • Number of 429 errors
  • Retry attempt counts
  • Circuit breaker state changes

Popular monitoring tools:

  • Datadog: Built-in API monitoring with rate limit tracking
  • Grafana: Custom dashboards for rate limit metrics
  • New Relic: APM with API performance monitoring
  • API Status Check: Multi-API monitoring with instant alerts

Frequently Asked Questions

What's the difference between rate limiting and throttling?

Rate limiting enforces hard limits—once you hit the cap, requests are rejected with 429 errors until the limit resets. Throttling slows down requests gradually as you approach limits, introducing artificial delays but still processing them. Rate limiting is binary (allowed/denied), while throttling is progressive (fast/slow/slower). Most APIs use rate limiting, though some (like AWS) implement both.

How do I request a rate limit increase?

Most API providers allow enterprise customers to request higher rate limits:

  1. Document your use case: Explain why you need higher limits with specific metrics (current usage, projected growth)
  2. Contact support or sales: Free tier users may need to upgrade first
  3. Propose optimization: Show you've already optimized (caching, batch requests)
  4. Negotiate pricing: Higher limits often come with higher costs
  5. Start small: Request a modest increase (2x-5x) rather than 100x

Providers are more likely to approve increases for customers with payment history and legitimate business use cases.

Should I implement rate limiting on my own API?

Yes, absolutely. Even internal APIs benefit from rate limiting to prevent abuse, bugs, and resource exhaustion. Implement rate limiting when:

  • Your API is publicly accessible or serves multiple clients
  • Backend resources are expensive (database queries, third-party API calls)
  • You need to enforce fair usage across customers or teams
  • Security is a concern (preventing brute force attacks)

Use established libraries (express-rate-limit, Django Ratelimit, Kong) rather than building from scratch.

What happens to webhooks during rate limits?

Webhook delivery usually has separate rate limits from API calls. During rate limit issues:

If your webhook endpoint is rate limited: The API provider will retry delivery using exponential backoff. Eventually retries exhaust and events are lost (check retention policies).

If the API provider is rate limited: Webhook delivery may be delayed but typically continues. Providers prioritize webhook delivery over synchronous API responses.

Best practice: Implement idempotent webhook processing and track processed event IDs to handle duplicate deliveries gracefully.

Can I be banned for hitting rate limits too often?

Yes, potentially. While occasional 429 errors are expected, persistent or aggressive violations may result in:

  • Temporary IP bans (minutes to hours)
  • API key suspension (requires support contact to restore)
  • Account review or termination (for clear abuse)

Implement proper backoff logic and respect Retry-After headers. If you're consistently hitting limits, optimize your integration or upgrade your plan—don't try to circumvent limits.

How do concurrent request limits differ from rate limits?

Rate limits control requests per time unit (100 req/min). Concurrent limits control simultaneous in-flight requests (10 concurrent connections). You can hit concurrent limits even with low request rates if requests take a long time to complete. Solutions:

  • Connection pooling (reuse connections instead of creating new ones)
  • Request queuing (wait for slots before making new requests)
  • Async processing (don't block while waiting for responses)

AWS Lambda, for example, has both invocation rate limits AND concurrent execution limits.

What's the best rate limiting algorithm to implement?

For most applications: Token bucket provides the best balance of burst handling and simplicity. It allows legitimate traffic spikes while preventing sustained abuse.

For strict fairness: Sliding window prevents burst exploitation but requires more memory and computation.

For simplicity: Fixed window is easiest to implement but vulnerable to burst attacks.

For traffic smoothing: Leaky bucket when you need to protect downstream systems from spikes.

Choose based on your specific requirements. If unsure, start with token bucket—it's what AWS, Shopify, and most major platforms use.

How do I test my rate limit handling code?

1. Use test endpoints: Many APIs provide dedicated test endpoints that allow higher request rates or faster limit resets.

2. Mock API responses: Simulate 429 errors in your test suite:

// Jest mock example
global.fetch = jest.fn(() => 
  Promise.resolve({
    status: 429,
    headers: {
      get: (name) => name === 'Retry-After' ? '5' : null
    }
  })
);

3. Use rate limit testing services: Tools like Artillery and k6 can generate controlled load to test rate limit behavior.

4. Implement chaos engineering: Randomly inject 429 responses in staging environments to verify resilience.

5. Monitor production carefully: Use gradual rollouts and robust observability when deploying rate limit handling changes.

Stay Ahead of API Issues

Don't wait for rate limit errors or outages to disrupt your production services. Proactive monitoring helps you catch issues before they impact users.

Monitor your critical APIs with API Status Check:

  • Real-time health checks every 60 seconds
  • Instant alerts via email, Slack, Discord, or webhook
  • Track response times and error rates across 100+ APIs
  • Historical uptime data and incident reports
  • No configuration required—start monitoring in 30 seconds

Popular APIs to monitor:

Start monitoring for free →


Last updated: February 4, 2026. Rate limit information is based on publicly documented policies and subject to change. Always refer to official API documentation for the most current limits.

Monitor Your APIs

Check the real-time status of 100+ popular APIs used by developers.

View API Status →