What is rate limiting?

Rate limiting controls how many requests a client can make in a given time period. It protects APIs from abuse, ensures fair usage, and prevents system overload.

What HTTP status code indicates rate limiting?

HTTP 429 Too Many Requests. Often includes Retry-After header indicating when to retry.

What are common rate limiting algorithms?

Token Bucket, Sliding Window Log, Sliding Window Counter, Leaky Bucket, Fixed Window Counter.

Rate Limiting Cheat Sheet

Quick reference for API rate limiting: algorithms, HTTP headers, status codes, and implementation best practices.

Algorithms Headers Status Codes Strategies Client Tools

Rate Limiting Algorithms

Algorithm	How It Works	Pros	Cons
Token Bucket	Tokens added at fixed rate; requests consume tokens	Allows bursting, smooth average rate	Complex to implement correctly
Sliding Window Log	Track timestamp of each request in window	Most accurate, no boundary issues	High memory usage
Sliding Window Counter	Weighted average of current + previous window	Low memory, smooth transitions	Approximation, not exact
Leaky Bucket	Requests queue and drain at fixed rate	Smooth output rate, prevents bursts	No bursting allowed
Fixed Window	Count requests per fixed time window (e.g., 100/min)	Simple, easy to implement	Boundary burst problem

HTTP Rate Limit Headers

Header	Description	Example
X-RateLimit-Limit	Maximum requests allowed in window	1000
X-RateLimit-Remaining	Requests remaining in current window	42
X-RateLimit-Reset	Unix timestamp when window resets	1709726400
Retry-After	Seconds to wait before retrying (on 429)	60
RateLimit-Limit	RFC draft: limit (req/window)	100;w=60
RateLimit-Remaining	RFC draft: remaining requests	10
RateLimit-Reset	RFC draft: seconds until reset	35

# Example response headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1709726400
Retry-After: 60

# RFC-style headers (new standard)
RateLimit-Limit: 100;w=60
RateLimit-Remaining: 10
RateLimit-Reset: 35

HTTP Status Codes

429 Too Many Requests	Rate limit exceeded. Include Retry-After header.
503 Service Unavailable	Server overloaded (may include Retry-After).
403 Forbidden	Quota exceeded (some APIs use this).

# 429 Response Example
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded your rate limit",
  "retry_after": 60
}

Rate Limiting Strategies

Strategy	Use Case	Example
Per-User	Authenticated APIs	100 req/min per API key
Per-IP	Public endpoints, auth endpoints	10 req/min per IP
Per-Endpoint	Expensive operations	10 search req/min, 100 others
Global	System-wide protection	10000 req/min total
Tiered	Freemium APIs	Free: 100/hr, Pro: 1000/hr
Burst + Sustained	Allow short bursts, limit average	Burst: 50, Sustained: 200/min

Client-Side Rate Limit Handling

// JavaScript: Exponential backoff with retry-after
async function fetchWithRateLimit(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const res = await fetch(url);

    if (res.status === 429) {
      const retryAfter = res.headers.get('Retry-After');
      const waitTime = retryAfter ? parseInt(retryAfter) : Math.pow(2, i) * 1000;
      console.log(`Rate limited. Waiting ${waitTime}ms...`);
      await new Promise(r => setTimeout(r, waitTime));
      continue;
    }

    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    return res.json();
  }
  throw new Error('Max retries exceeded');
}

# Python: Rate limit aware HTTP client
import time
import requests
from requests.exceptions import HTTPError

def fetch_with_backoff(url, max_retries=3):
    for i in range(max_retries):
        response = requests.get(url)

        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 2 ** i))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)
            continue

        response.raise_for_status()
        return response.json()

    raise Exception("Max retries exceeded")

# cURL: Check rate limit headers
curl -i https://api.example.com/data

# Response headers:
# X-RateLimit-Limit: 1000
# X-RateLimit-Remaining: 42
# X-RateLimit-Reset: 1709726400

# Sleep until reset (bash)
remaining=$(curl -sI https://api.example.com/data | grep X-RateLimit-Remaining | cut -d' ' -f2)
if [ "$remaining" -lt 10 ]; then
  reset=$(curl -sI https://api.example.com/data | grep X-RateLimit-Reset | cut -d' ' -f2)
  sleep_time=$((reset - $(date +%s)))
  echo "Rate limit low. Sleeping for $sleep_time seconds"
  sleep $sleep_time
fi

Rate Limiting Tools & Middleware

Tool	Stack	Algorithm
express-rate-limit	Node.js/Express	Fixed Window
rate-limit-redis	Node.js + Redis	Sliding Window
django-ratelimit	Python/Django	Fixed Window
flask-limiter	Python/Flask	Multiple
Redis + Lua	Any (custom)	Token Bucket / Sliding
nginx limit_req	Nginx	Leaky Bucket
Cloudflare Rate Limiting	Edge/CDN	Sliding Window

Best Practices

✓

Always include rate limit headers in responses (even successful ones)

✓

Return 429 with Retry-After header, not 503 or 403

✓

Document rate limits clearly in API docs

✓

Use tiered limits for different user plans (free vs paid)

✓

Consider higher limits for read-only vs write operations

✓

Implement graceful degradation, not hard cutoffs

✗

Don't hide rate limits—transparency builds trust

✗

Avoid per-endpoint limits that are too complex to understand

Related Resources

Guide: API Rate Limiting Best Practices

Deep dive into implementing rate limiting for your API

Tool: Request Throttle Simulator

Test how your app handles rate limiting