Rate Limiting
How the Gateway protects your upstream services and enforces consumer quotas using a hierarchical priority system.
The Hierarchical Priority Chain
The system evaluates rate limits from the most specific context to the broadest. The first limit found in this chain is the one enforcement uses.
1. Endpoint Level
Limits specific to a single URL path (e.g., /auth/login).
2. Catalog Level
Applies to all endpoints within a specific Catalog (e.g., all Payments API endpoints).
3. Tenant Level
The broadest limit, covering every request within a Tenant's namespace.
How Limits are Expressed
Limits are defined as Seconds Between Requests. This provides a steady, predictable flow of traffic to your backend rather than allowing "bursts" that might overwhelm cold Lambda functions or small containers.
1 SecondEquivalent to 60 Requests Per Minute (RPM).
60 SecondsEquivalent to 1 Request Per Minute.
Enforcement Details
- Granularity: Limits are enforced per Consumer Access ID. A single app might have different throughput for different Catalogs.
- Status Code: When a limit is exceeded, the Proxy returns
429 Too Many Requests. - Stateless Caching: Rate limit configurations are cached for 5 minutes, ensuring enforcement logic doesn't slow down the request path.
Distributed Rate Limiting
The rate limiting is locally enforced in each Proxy instance for maximum performance. If you scale to multiple Proxy instances, the total effective limit across your cluster will be Limit * InstanceCount.