Design a Rate Limiter
Design a distributed rate limiting service that protects APIs from abuse. Consider token bucket, sliding window, and fixed window algorithms. Handle distributed coordination across multiple server instances.
You'll practice
Functional Requirements
- Identify users by ID, IP, or API key
- Limit requests based on configurable rules
- Return proper error headers and status codes
Non-Functional Requirements
- Availability over consistency
- Low latency rate limit checks (< 10ms)
- Scalable to 1M rps
Frequently Asked Questions
What is a rate limiter and why is it important?
A rate limiter controls how many requests a client can make to an API within a given time window. It protects backend services from abuse, prevents resource exhaustion, and ensures fair usage across clients.
What are the main rate limiting algorithms?
The most common algorithms are token bucket (allows bursts up to a limit), sliding window log (tracks exact request timestamps), sliding window counter (approximates sliding window with less memory), and fixed window counter (simplest but allows burst at window boundaries).
How do you handle rate limiting in a distributed system?
In distributed setups, you need a centralized store like Redis to track request counts across all server instances. Alternatives include sticky sessions (routing a client to the same server) or approximate algorithms that tolerate slight inconsistency for better performance.
What HTTP status code should a rate limiter return?
A rate limiter should return HTTP 429 (Too Many Requests) with a Retry-After header indicating when the client can retry. Including X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers helps clients self-regulate.
Ready to design this system?