Rate Limiter

Motivation

Prevent resource starvation e.g DOS
Reduce costs

Design

Locations where the rate limiter can live:

Client - unreliable, easy to subvert.
Server - On the server alongside the API.
Middleware - e.g. an API gateway when using micro-services.

Rate limiting could just be done with API gateway but if you were creating it from scratch it might look a bit like this:

A Lambda acts as the rate limiter.
Rate limiting rules are stored in S3. A Lambda loads new rules into the cache to allow for fast responses.
Counters are stored in the Redis cache, again for speed.
Allowed requests are forwarded to the API e.g. another Lambda.
Rejected requests are either dropped or stored in a queue to be processed later.

Implementation

Lots of different algorithms e.g. TokenBucket. When implementing also consider whether to use:

use a global bucket for all requests
a bucket per use case (e.g. IP Address)

Where to store counters?

disk is slow => not DB
Use an in-memory cache e.g. Redis

Rate Limiting

Reject throttled requests with HTTP 429 too-many-reqests

Headers

Provide information to client via headers:

X-Ratelimit-Remaining number of remaining requests in window
X-Ratelimit-Limit total number of requests allowed in a window
X-Ratelimit-Retry-After when current throttling stops

Algorithms