API Rate Limiting: Complete Implementation Guide for Developers

Rate limiting is the unsung hero of API stability. Too lenient, and a single misbehaving client can take down your entire infrastructure. Too strict, and you alienate legitimate users and break integrations.

🔐

Recommended

🔐 Protect your API keys from abuse

1Password allows you to securely store and share API credentials across your dev team while maintaining strict access control.

Try 1Password Free →

Whether you're building a public API for thousands of developers or an internal microservice architecture, implementing a robust rate limiting strategy is non-negotiable for production reliability.

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

What is API Rate Limiting?

Rate limiting is the process of controlling the number of requests a client can make to an API within a specific timeframe. It prevents resource exhaustion, mitigates DoS attacks, and ensures fair usage across all clients.

📡

Recommended

📡 Monitor your rate limits in real-time

Better Stack provides detailed monitoring and alerting for your API endpoints, letting you know before your rate limits cause an outage.

Try Better Stack Free →

Common Rate Limiting Algorithms

1. Fixed Window

The simplest approach. A counter is reset at the start of every window (e.g., 1,000 requests per hour).

Pros: Easy to implement.

Cons: The "burst" problem. A client can send 1,000 requests at the end of window A and 1,000 at the start of window B, effectively doubling the rate for a short period.

2. Sliding Window Log

Tracks every request timestamp in a log. When a new request comes in, it filters out timestamps older than the current window.

Pros: Extremely accurate.

Cons: High memory overhead to store every request timestamp.

3. Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes a token. If the bucket is empty, the request is rate-limited.

Pros: Allows for controlled bursts while maintaining a long-term average rate.

Cons: Slightly more complex to implement than fixed windows.

4. Leaky Bucket

Requests enter a bucket and are processed (leak) at a constant, steady rate. If the bucket overflows, requests are dropped.

Pros: Smoothes out traffic spikes completely.

Cons: Can add latency to requests even when the system is under-utilized.

Implementation Example: Token Bucket in TypeScript

class TokenBucket {
  private tokens: number;
  private lastRefill: number;
  private readonly capacity: number;
  private readonly refillRate: number; // tokens per ms

  constructor(capacity: number, refillRate: number) {
    this.capacity = capacity;
    this.refillRate = refillRate;
    this.tokens = capacity;
    this.lastRefill = Date.now();
  }

  refill() {
    const now = Date.now();
    const delta = now - this.lastRefill;
    this.tokens = Math.min(this.capacity, this.tokens + delta * this.refillRate);
    this.lastRefill = now;
  }

  async take(): Promise<boolean> {
    this.refill();
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }
    return false;
  }
}

// Usage: 10 requests burst, refills at 1 request per second
const limiter = new TokenBucket(10, 1 / 1000);
const allowed = await limiter.take();
if (!allowed) {
  // Return 429 Too Many Requests
}

Best Practices for Rate Limiting

Return Clear Headers: Always include X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.
Use HTTP 429: Use the 429 Too Many Requests status code.
Retry-After Header: Tell the client exactly how long to wait before retrying.
Tiered Limiting: Implement different limits for free vs. paid users.
Distributed Rate Limiting: Use Redis or similar for shared state across multiple API nodes.

Stop guessing your API health

Combine your rate limiting strategy with proactive monitoring. Know exactly when your users are hitting limits and when your system is under stress.

Visit API Status Check →