Rate Limiter
Motivation
- Prevent resource starvation e.g DOS
- Reduce costs
Design
Locations where the rate limiter can live:
- Client - unreliable, easy to subvert.
- Server - On the server alongside the API.
- Middleware - e.g. an API gateway when using micro-services.
Rate limiting could just be done with API gateway but if you were creating it from scratch it might look a bit like this:
- A Lambda acts as the rate limiter.
- Rate limiting rules are stored in S3. A Lambda loads new rules into the cache to allow for fast responses.
- Counters are stored in the Redis cache, again for speed.
- Allowed requests are forwarded to the API e.g. another Lambda.
- Rejected requests are either dropped or stored in a queue to be processed later.
Implementation
Lots of different algorithms e.g. TokenBucket. When implementing also consider whether to use:
- use a global bucket for all requests
- a bucket per use case (e.g. IP Address)
Where to store counters?
- disk is slow => not DB
- Use an in-memory cache e.g. Redis
Rate Limiting
Reject throttled requests with HTTP 429 too-many-reqests
Provide information to client via headers:
X-Ratelimit-Remaining
number of remaining requests in window
X-Ratelimit-Limit
total number of requests allowed in a window
X-Ratelimit-Retry-After
when current throttling stops
Algorithms