Rate Limiting

How the Gateway protects your upstream services and enforces consumer quotas using a hierarchical priority system.

The Hierarchical Priority Chain

The system evaluates rate limits from the most specific context to the broadest. The first limit found in this chain is the one enforcement uses.

1. Endpoint Level

Limits specific to a single URL path (e.g., /auth/login).

2. Catalog Level

Applies to all endpoints within a specific Catalog (e.g., all Payments API endpoints).

3. Tenant Level

The broadest limit, covering every request within a Tenant's namespace.

How Limits are Expressed

Limits are defined as Seconds Between Requests. This provides a steady, predictable flow of traffic to your backend rather than allowing "bursts" that might overwhelm cold Lambda functions or small containers.

1 Second

Equivalent to 60 Requests Per Minute (RPM).

60 Seconds

Equivalent to 1 Request Per Minute.

Enforcement Details

Granularity: Limits are enforced per Consumer Access ID. A single app might have different throughput for different Catalogs.
Status Code: When a limit is exceeded, the Proxy returns 429 Too Many Requests.
Stateless Caching: Rate limit configurations are cached for 5 minutes, ensuring enforcement logic doesn't slow down the request path.

Distributed Rate Limiting

The rate limiting is locally enforced in each Proxy instance for maximum performance. If you scale to multiple Proxy instances, the total effective limit across your cluster will be Limit * InstanceCount.