Skip to content

31_Design_A_Rate_Limiter

Difficulty: Practical Design Problems
Generated on: 2025-07-13 02:57:25
Category: System Design Cheatsheet


What is it? A rate limiter controls the rate at which users or services can make requests to an API or service. It prevents abuse, protects infrastructure from overload, and ensures fair usage.

Why is it important? Without rate limiting, your system is vulnerable to:

  • Denial-of-Service (DoS) attacks
  • Brute-force attacks
  • Resource exhaustion
  • Unfair resource consumption by a single user
  • Identification: Identifying users or services making requests (e.g., IP address, user ID, API key).
  • Counting: Tracking the number of requests made within a specific time window.
  • Decision: Deciding whether to allow or reject a request based on the configured limits.
  • Action: Enforcing the decision (e.g., allowing the request, returning an error code like 429 Too Many Requests).
  • Configuration: Defining the rate limits based on various criteria (e.g., user type, API endpoint, time window).
  • Storage: Storing request counts and potentially other metadata.

Basic Rate Limiter Architecture

sequenceDiagram
participant User
participant Client
participant RateLimiter
participant API Server
User->>Client: Makes Request
Client->>RateLimiter: Request to API
RateLimiter->>RateLimiter: Check Rate Limit
alt Limit Exceeded
RateLimiter->>Client: Return 429 Error
else Limit Not Exceeded
RateLimiter->>API Server: Forward Request
API Server->>RateLimiter: Response
RateLimiter->>Client: Return Response to User
end

Distributed Rate Limiter with Redis

sequenceDiagram
participant User
participant API Server
participant RateLimiter
participant Redis
User->>API Server: Makes Request
API Server->>RateLimiter: Request to check rate
RateLimiter->>Redis: Increment Counter (e.g., INCR user:123)
Redis-->>RateLimiter: Counter Value
alt Limit Exceeded
RateLimiter->>API Server: Return 429 Error
API Server->>User: Return 429 Error
else Limit Not Exceeded
RateLimiter->>API Server: Allow Request
API Server->>API Server: Process Request
API Server->>User: Return Response
end

When to Use:

  • Protecting APIs from abuse (e.g., limiting requests per user per minute).
  • Preventing DDoS attacks.
  • Ensuring fair usage of resources.
  • Throttling requests to backend systems.
  • Controlling costs by limiting API usage.
  • Preventing brute-force attacks on login endpoints.

When to Avoid (or Re-evaluate):

  • For internal services where trust is implicitly assumed (e.g., microservices within a trusted network). Consider other mechanisms like circuit breakers or load shedding instead.
  • When the overhead of rate limiting outweighs the benefits (e.g., for very low-traffic services).
  • When you need extremely low latency and the rate limiter adds significant overhead.
FeatureProsCons
CentralizedSimpler to implement and manage; Easier to enforce global rate limits.Single point of failure; Can become a bottleneck; Latency impact.
DistributedMore scalable and resilient; Lower latency (if implemented correctly).More complex to implement and manage; Requires coordination between nodes; Data consistency challenges.
Client-SideReduces load on the server; Fast response times for the client.Can be easily bypassed; Unreliable for security purposes.
Server-SideMore reliable and secure; Easier to enforce policies.Adds overhead to the server; Can increase latency.
Token BucketHandles burst traffic well; Simple to understand and implement.Can allow bursts that exceed the average rate limit.
Leaky BucketSmoother traffic flow; Prevents bursts; More predictable.Can lead to request drops if the bucket is full.
Fixed WindowSimplest to implement.Can allow double the rate limit at window boundaries.
Sliding WindowMore accurate rate limiting; Prevents double the rate limit at window boundaries.More complex to implement.
Storage (Redis)Fast read/write performance; Good for distributed rate limiting; Expiring keys.Cost; Operational overhead of managing a Redis cluster.
Storage (Database)Persistent storage; Easier to query and analyze data.Slower read/write performance compared to in-memory solutions; Potential for database overload.
  • Centralized Rate Limiter: Scales vertically (more powerful server) until it becomes a bottleneck. Not ideal for large-scale systems. Consider caching or read replicas to improve read performance.

  • Distributed Rate Limiter: Scales horizontally by adding more rate limiter instances. Requires a shared storage solution like Redis or a distributed database.

    • Sharding: Partition requests across rate limiter instances based on a consistent hashing algorithm (e.g., hash(user_id) % num_instances).
    • Replication: Replicate data across multiple nodes for fault tolerance and read scalability.
  • Performance Considerations:

    • Latency: Minimize the latency added by the rate limiter. Use efficient data structures and algorithms. Consider caching frequently accessed data.
    • Throughput: Ensure the rate limiter can handle the expected request volume. Load test the system to identify bottlenecks.
    • Storage: Choose a storage solution that can handle the required read/write operations with low latency.
    • Concurrency: Handle concurrent requests efficiently using threads, asynchronous programming, or non-blocking I/O.
  • Twitter: Uses rate limiting to prevent spam and abuse. Limits the number of tweets, follows, and API calls per user per unit of time.
  • GitHub: Rate limits API requests to prevent abuse and ensure fair usage. Uses different rate limits for authenticated and unauthenticated users.
  • Stripe: Rate limits API requests based on the API key and the type of request.
  • AWS: Rate limits API calls to protect its infrastructure and prevent abuse. Provides different rate limits for different services.
  • Cloudflare: Rate limits requests to protect websites from DDoS attacks and other malicious traffic.
  • Design a rate limiter for a web API. (Most common)
  • How would you handle distributed rate limiting?
  • What are the different algorithms for rate limiting? (Token Bucket, Leaky Bucket, Fixed Window, Sliding Window)
  • How would you choose the appropriate rate limit for an API?
  • What are the trade-offs between different rate limiting approaches?
  • How do you handle rate limit exceptions (429 errors)?
  • How do you monitor and alert on rate limiting?
  • How do you prevent a single user from consuming all the rate limit tokens? (Sticky sessions, consistent hashing)
  • How would you design a rate limiter that can handle different types of requests with different costs? (Weighted rate limiting)
  • How do you handle different levels of service (e.g., free vs. paid tiers) with different rate limits?
  • What data structures would you use to implement a rate limiter? (Hash maps, sets, sorted sets)
  • How do you handle clock synchronization issues in a distributed rate limiter? (NTP, atomic operations)
  • How would you test your rate limiter? (Unit tests, integration tests, load tests)
  • Discuss the CAP theorem in the context of distributed rate limiting. (Consistency vs. Availability in the face of network partitions)
  • Explain the difference between rate limiting and throttling. (Rate limiting is a hard limit, throttling adjusts the rate dynamically)

Example Interview Scenario:

“Design a rate limiter for a login API to prevent brute-force attacks. Consider scalability, performance, and security.”

Possible Answer Outline:

  1. Requirements: Prevent brute-force attacks, handle high volume of login attempts, low latency, scalable.

  2. Algorithm: Token Bucket or Sliding Window. Sliding window is more accurate for preventing bursts over window boundaries.

  3. Storage: Redis for fast read/write performance and expiring keys (to automatically remove user data after a certain period of inactivity).

  4. Architecture: Distributed rate limiter with Redis cluster. Use consistent hashing to distribute users across rate limiter instances.

  5. Implementation Details:

    • Key: login:ip_address or login:user_id (or both, depending on requirements).
    • Increment counter in Redis on each login attempt.
    • Check if counter exceeds the limit.
    • Return 429 error if limit is exceeded.
    • Set TTL (Time-To-Live) on the key to expire after a certain period (e.g., 1 hour).
  6. Scalability: Redis cluster can be scaled horizontally. Consistent hashing ensures even distribution of load.

  7. Security: Use strong passwords and consider other security measures like CAPTCHA.

  8. Monitoring: Monitor the number of 429 errors and Redis performance.

  9. Trade-offs: Redis adds operational overhead. Consider using a database if persistence is required.

This cheatsheet provides a comprehensive overview of rate limiter design. Remember to tailor your approach to the specific requirements of your system. Good luck!