System Design: Implementing Rate Limiting
Rate limiting. It sounds simple, but it’s surprisingly complex when you need to do it *well* in a distributed system. It’s a frequent topic in system design interviews, and more importantly, it’s a…
System Design: Implementing Rate Limiting
Rate limiting. It sounds simple, but it’s surprisingly complex when you need to do it *well* in a distributed system. It’s a frequent topic in system design interviews, and more importantly, it’s a crucial component for protecting your APIs and infrastructure. Let’s break down why it matters, how different algorithms work, and how you might actually implement it.
Why Rate Limiting?
Think about a popular API endpoint. Without rate limiting, a malicious actor (or even just a poorly written client) could bombard it with requests, potentially:
Rate limiting prevents these issues by controlling how many requests a user (or IP address, or API key) can make within a specific timeframe. It’s a fundamental building block for building resilient and scalable applications. It's not just about *stopping* bad actors, though. It's also about providing a fair experience for all users.
Rate Limiting Algorithms: The Core Concepts
There are several algorithms you can use for rate limiting. Each has its trade-offs in terms of accuracy, complexity, and resource usage.
1. Token Bucket
This is a classic and widely used algorithm. Imagine a bucket that holds tokens. Each request consumes a token. Tokens are added to the bucket at a fixed rate. If the bucket is empty, the request is rejected (or delayed).
import timeclass TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity
self.tokens = capacity
self.refill_rate = refill_rate # tokens per second
self.last_refill = time.time()
def consume(self, tokens=1):
now = time.time()
time_passed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + time_passed * self.refill_rate)
self.last_refill = now
if self.tokens >= tokens:
self.tokens -= tokens
return True # Request allowed
else:
return False # Request rejected
Pros: Smooths out bursts of traffic. Easy to understand and implement. Cons: Can allow short bursts exceeding the average rate if the bucket is full.
2. Leaky Bucket
Similar to the token bucket, but instead of adding tokens, requests are added to the bucket. The bucket "leaks" requests at a fixed rate. If the bucket is full, requests are dropped.
Pros: Guarantees a consistent outflow rate. Cons: Less flexible than the token bucket. Doesn't handle bursts as gracefully.
3. Fixed Window Counter
This is the simplest approach. You divide time into fixed-size windows (e.g., 1 minute). You count the number of requests within each window. If the count exceeds the limit, you reject requests.
import timeclass FixedWindowCounter:
def __init__(self, limit, window_size):
self.limit = limit
self.window_size = window_size # in seconds
self.counts = {} # {window_start_time: count}
def is_allowed(self):
now = int(time.time())
window_start = now - (now % self.window_size)
if window_start not in self.counts:
self.counts[window_start] = 0
if self.counts[window_start] < self.limit:
self.counts[window_start] += 1
return True
else:
return False
Pros: Easy to implement. Low overhead. Cons: Can allow double the allowed requests at window boundaries. For example, if the limit is 10 requests per minute, a user could make 10 requests at the end of one minute and another 10 at the beginning of the next.
4. Sliding Window Log
This improves on the fixed window counter by keeping a log of request timestamps. To determine if a request is allowed, you count the number of requests within the sliding window (e.g., the last minute).
Pros: More accurate than the fixed window counter. Cons: Higher memory usage due to the log. More complex to implement.
5. Sliding Window Counter
A hybrid approach that combines the simplicity of the fixed window counter with the accuracy of the sliding window log. It uses multiple fixed windows and averages the request counts across them.
Pros: Good balance of accuracy and performance. Cons: More complex than the fixed window counter.
Implementing Rate Limiting in a Distributed System
The algorithms above are relatively straightforward to implement in a single process. But what happens when you have multiple servers handling requests? You need a way to share rate limiting state across your infrastructure.
Here are a few common approaches:
INCR, EXPIRE) to safely increment counters and set expiration times. Each server can access the shared Redis instance.Example (Redis with Python):
import redisredis_client = redis.Redis(host='localhost', port=6379, db=0)
def is_rate_limited(user_id, limit, window_size):
key = f"rate_limit:{user_id}"
now = int(time.time())
pipe = redis_client.pipeline()
pipe.incr(key, 1)
pipe.expire(key, window_size)
count = pipe.execute()[0]
return count > limit
Practical Tips
Next Steps
Rate limiting is a complex topic, and this is just a starting point. Here are some things you can do to learn more:
INCR and EXPIRE safely.ratelimit in Python).Building robust rate limiting into your applications is an investment that will pay off in terms of stability, security, and user experience. Don't treat it as an afterthought!