What is API Rate Limiting?
API rate limiting is a technique that controls the number of requests a client can make to your API within a specified time period. It's essential for protecting your infrastructure from abuse, ensuring fair usage among users, and maintaining service quality for all clients.
Without rate limiting, a single misbehaving client could overwhelm your servers, degrade performance for other users, or rack up massive infrastructure costs. Rate limiting acts as a circuit breaker, gracefully rejecting excess requests with HTTP 429 (Too Many Requests) responses.
Why Rate Limiting Matters
- Protection: Prevents DDoS attacks and accidental abuse
- Fairness: Ensures all users get equal access to resources
- Cost Control: Limits infrastructure spending from runaway clients
- Security: Throttles brute-force attacks on authentication endpoints
- Compliance: Meets SLA requirements for uptime and performance
Rate Limiting Algorithms
1. Fixed Window Rate Limiting
Fixed window divides time into equal-sized windows and counts requests per window:
# Pseudo-code for fixed window
window_start = floor(current_time / window_size)
request_count = redis.get(user_id + ":" + window_start)
if request_count < limit:
redis.incr(user_id + ":" + window_start)
return allow
else:
return reject_with_429()
Pros: Simple to implement, memory efficient. Cons: Allows 2x burst at window boundaries.
2. Sliding Window Rate Limiting
Sliding window uses a rolling time period, eliminating the boundary problem:
# Pseudo-code for sliding window
current_time = now()
window_start = current_time - window_size
# Remove old entries
redis.zremrangebyscore(user_id, 0, window_start)
# Count current requests
request_count = redis.zcard(user_id)
if request_count < limit:
redis.zadd(user_id, {request_id: current_time})
return allow
else:
return reject_with_429()
Pros: Smooth rate limiting, no boundary bursts. Cons: More memory usage.
3. Token Bucket Algorithm
Token bucket adds tokens at a fixed rate; each request consumes one token:
# Pseudo-code for token bucket
bucket = redis.get(user_id + ":bucket")
current_time = now()
# Add tokens based on elapsed time
elapsed = current_time - bucket.last_update
new_tokens = min(bucket.capacity, bucket.tokens + elapsed * refill_rate)
if new_tokens >= 1:
bucket.tokens = new_tokens - 1
bucket.last_update = current_time
redis.set(user_id + ":bucket", bucket)
return allow
else:
return reject_with_429()
Pros: Allows controlled bursting, smooth traffic shaping. Cons: More complex state management.
4. Leaky Bucket Algorithm
Leaky bucket processes requests at a constant rate, like water dripping from a bucket:
# Pseudo-code for leaky bucket
bucket = redis.get(user_id + ":leaky")
current_time = now()
# Process requests at constant rate
processed = (current_time - bucket.last_leak) * leak_rate
bucket.queue = max(0, bucket.queue - processed)
if bucket.queue + 1 <= bucket.capacity:
bucket.queue += 1
bucket.last_leak = current_time
return allow
else:
return reject_with_429()
Pros: Enforces constant output rate. Cons: Can be too restrictive for bursty traffic.
HTTP Rate Limit Headers
Communicate rate limit status to clients with standard headers:
X-RateLimit-Limit: 1000 # Maximum requests per window
X-RateLimit-Remaining: 950 # Requests left in current window
X-RateLimit-Reset: 1678901234 # Unix timestamp when window resets
Retry-After: 60 # Seconds to wait before retrying (on 429)
Implementation Best Practices
- Use Redis for distributed rate limiting: Ensures consistency across multiple servers
- Set reasonable defaults: 1000 requests/hour is common for free tiers
- Implement tiered limits: Free vs Pro vs Enterprise tiers
- Log rejected requests: Monitor for abuse patterns
- Use exponential backoff: Guide clients to slow down gracefully
- Whitelist trusted IPs: Bypass limits for internal services
Rate Limiting Strategies by Endpoint
Not all endpoints need the same limits:
| Endpoint Type | Recommended Limit | Reasoning |
|---|---|---|
| Authentication | 5-10/minute | Prevent brute force |
| Read operations | 100-1000/minute | Generally safe |
| Write operations | 10-100/minute | Higher cost, more risk |
| Expensive queries | 1-10/minute | Resource-intensive |
| Webhooks | 1000+/minute | Automated, expected volume |
Common Rate Limiting Mistakes
- Rate limiting by IP only: NAT means multiple users share one IP
- No visible limits: Clients can't optimize without headers
- Too aggressive limits: Blocks legitimate traffic
- Ignoring API keys: Always rate limit by API key, not just IP
- No bypass mechanism: Need escape hatch for false positives
aiforeverthing.com — API tools, rate limit testers, and more
Testing Rate Limiting
Before deploying, test your rate limiting thoroughly:
- Use tools like
ab(Apache Bench) orwrkto send bulk requests - Verify 429 responses occur at the correct threshold
- Check that Retry-After headers are accurate
- Test recovery after the window resets
- Verify distributed consistency across multiple servers
Frequently Asked Questions
What HTTP status code should I return when rate limited?
Use HTTP 429 Too Many Requests. This is the standard status code for rate limiting and is widely understood by clients and monitoring tools.
Should I rate limit by IP address or API key?
Use both. Rate limit by API key for authenticated requests (more accurate user tracking) and by IP for unauthenticated endpoints (DDoS protection).
How do I handle rate limit bypass for premium users?
Implement tiered rate limiting where premium tiers have higher or unlimited quotas. Check the user's tier before applying rate limit logic.
What's the best rate limit for a new API?
Start conservative: 100 requests/minute for standard users, 1000/minute for premium. Monitor usage patterns and adjust based on actual traffic and infrastructure capacity.
How can I prevent rate limit abuse?
Use fingerprinting techniques (combine IP, user agent, API key), implement progressive penalties for repeat offenders, and use CAPTCHA challenges for suspicious patterns.
Should rate limits reset at fixed times or rolling windows?
Rolling windows (sliding window, token bucket) provide smoother rate limiting without boundary bursts. Use fixed windows only for simplicity or billing purposes.