Rate limiting is controlling the number of requests a client can make to a service in a time period. Prevents abuse and ensures fair resource allocation. A public API might allow 100 requests per minute per client. Exceed the limit and requests are rejected. This protects against denial-of-service attacks where someone floods your service with requests.
It ensures that one client using the API excessively doesn't degrade experience for others. Token bucket is a common rate limiting algorithm. Each client gets a bucket that fills tokens at a fixed rate. Each request consumes a token. If the bucket is empty, requests are rejected. This allows bursts, use multiple tokens at once, but enforces an average rate. Sliding window is another approach.
Count requests in a sliding time window. If the count exceeds the limit, reject requests. Rate limiting is essential for public APIs. Without it, abuse is trivial. With it, you protect your service while enabling legitimate use. Different APIs have different rate limits. AWS API Gateway might allow 1000 requests per second.
GitHub API allows 60 requests per minute unauthenticated, 5000 authenticated. Rate limits are part of API design.