Comprehensive Guide to Rate Limiting
Rate limiting is a fundamental construct used to control flow. It is used almost everywhere - with queues outside retail stores, to metering cars entering the freeway.
In the digital world, rate limit controls the amount of traffic flowing to a resource -
Excessive requests can cause denial of service and the resource may go down since it cannot keep up with the excessive rate of requests. Rate limit prevents a resource from getting overwhelmed, this resource can be a network equipment, an endpoint on a server, a server or an API.
Excessive requests if not controlled can also make access to a resource unfair - imagine students registering for a class in a semester, and too many requests from a few users can prevent other students from registering. Or imagine you can’t buy tickets to your favorite sport or concert tickets just because there is too much traffic being directed at the ticketing server
Risk of cyberattacks has increased over time, so much that Gartner predicts the investments are on pace to reach over 215B
Rate-limiting can be a key mechanism in protecting from cyber attacks. It can help eliminate or make ineffective attacks by limiting malicious traffic
Rate Limiting is simple, but effective to protect resources, be it networks, servers or APIs. It is widely used with several benefits.
Rate limiting helps prevent denial of service attacks. Appropriate rate-limiting can be effective in thwarting denial of service attacks. Often times, to reduce the overhead on a server, rate-limiting can be implemented in hardware eg: programming a Field Programmable Gate Array or FPGA. Such mechanisms allow effective rate-limits without the software having to process requests. Such mechanisms make it difficult for the attacker to overwhelm a server since a lot of unimportant traffic gets dropped in the hardware at line-rate. A more intelligent approach that sends prioritized traffic to the software for better control can also be implemented.
Ensuring fairness or meeting SLAs ensures that user experience is improved. It may prevent unnecessary delays in serving a client. Too much traffic on a system may cause deteriorated service experience eg: delayed checkout from a cart
Meeting SLAs for services is another reason to rate limit. SLAs may be associated with general availability of service, where the service is needed to meet/exceed the SLAs. SLAs may also be associated with different clients requesting the service. Every client, depending on say the tier of service may be eligible for a different SLA. Rate limiting helps you achieve these SLAs
Resource consumption can radically increase if not kept in check. It is necessary to plan the resource requirement for a service. Rate limits can be applied to keep the incoming traffic in check. If no rate limits are applied, resource consumption can increase unchecked. For example, when auto-scaling is enabled, a large amount of traffic can increase the scale factor thereby increasing the costs.
Fairness of serving clients can be achieved using rate limiting. Ensuring the everyone gets a fair chance at accessing the resources while rate limiting the ones which are sending more requests, can help achieve fairness
Efficient utilization of resources is a key benefit from rate limiting. Preventing starvation of resources and ensuring that the resources are available to all users is important.
To fight cyber attacks and Distributed denial of service, rate limiting is an important tool. For instance, rate limiting can be applied to a large amount of traffic from a single IP address. For a distributed denial of service attack, origination from multiple IP address, the traffic can be split and inspected to characterize an attack (as coming from a rogue actor or a geographical location), to circumvent it.
Often times, credentials which are stolen, end up being used by bots. These credentials are stuffed in forms to gain access to resources which are protected using credentials. Rate Limiting can identify such attacks and thwart them.
In another attack, when credentials are not known, they are simply guessed and several attempts are made to gain access to resources. Such brute force attacks can be detected and mitigated using rate limiting. The typical identifying characteristic of these kind of attacks is the number of failures is very high. This results in consumption of lot of resource if the attack isn’t stopped.
For retailers running e-commerce website, scraping is a problem. Bots scrape deals and pricing information from the e-commerce the website. Scraping involves a lot of requests coming from a client. Rate limiting can be used to prevent scraping
There are several algorithms used for rate-limiting. We discuss a few common ones here -
The sliding window rate limiting is similar to sliding window protocol used in TCP congestion control. Every client, IP or a user is provided a window of allowed requests that they can make. As requests are completed, the window slides and more requests are allowed. This limits the number of requests that are outstanding at a specific time, not to exceed the size of the window.
The rate of requests can be adjusted by adjusting the size of the window and also for a specific duration. While the window tracks the rate, the window size can also change as time passes. This provides more control over rate limiting.
Both these forms of rate limiting allow rates on the basis of data or token. For a leaky bucket, it starts with a fixed amount of data, which gets removed as the requests are made. Similarly for token bucket, it starts with a fixed amount of token, which get removed as requests are made.
Both token and data are replenished periodically. This ensure a steady rate of data or tokens flowing through the system.
As the unit of measure is data in leaky bucket, it may not match to fixed number of requests over time. The rate at which requests are made may not be the same as the rate at which data is arriving, which may cause requests to have a longer wait time.
Token bucket may track the number of requests which may be mapped to the number of tokens.
Fixed window is time based. Time is a sequence of windows with each window allowing only a count of requests. Any requests over the specified count have to wait until the next window.
With a weighted token bucket, every request has a weight. For instance requests that are resource intensive may have a different weight from the ones that have less weight. Depending on the request weight, they may be rate limited differently.
Simple rate limiting may have counting and limiting for bytes, packets or request. We talk more about how rate limiting can meet requirements for different scenarios and how it can be more intelligent in limiting
To rate limit, we first need to measure and track the incoming rate. However, tracking the rate is one aspect. Finding out more details, context and characteristics about the arrival rate can help us better understand the incoming traffic flow, allowing us to better control or rate limit it.
At a high level, we can check if the incoming rate and try to characterize it by -
Also understanding what is being rate limited is important. At a high level we might just be trying to limit the access to a server. But in a publish/subscribe scenario, we may be trying to rate limit the number of messages published, or the number of messages published by a specific client or the number of messages published by a specific client to a specific topic.
Or in case of an API, we want to rate limit the number of requests being made to an API endpoint, or the kind of requests being made to a specific endpoint (eg: limit the number of POST v/s number of GET) or number of requests from a specific user (tracked by IP address or JWT)
Once we are able to clearly characterize the incoming traffic, we are better able to rate limit depending on the use-case.
To implement an efficient rate-limiting mechanism, it’s important to consider the characteristic of incoming traffic and the resource being protected.
Understanding traffic characteristics lets us better understand -
The above design and the tradeoffs ensure that certain aspects of the system are always met. For instance -
There are several aspects to consider when getting rate limits right. Here we enumerate on some of the key aspects of it -
What is identified as excess traffic is rate limited. But what traffic gets qualified as excess traffic? How do you ensure that traffic that should be allowed to pass through isn’t rate limited?
One way is to build appropriate context for the incoming traffic which will let us better identify good traffic from bad. This brings us to the next aspect of rate limiting, which is how much state needs to be stored to appropriately classify state?
Inspecting traffic at high rates and classifying it to better rate limit, can add up the amount of state and resources that are needed before any rate limiting action is performed. It is vital to ensure that the system does not exhaust the resources when implementing rate limiting.
With the appropriate tradeoffs in accurately classifying traffic and storing state, it is vital to ensure that good traffic isn’t dropped. Which brings us to another aspect of rate limiting, how to avoid false positives
If rate limiting algorithms aren’t accurate in identifying traffic that is bad, it may cause denial of service to legitimate users. Care should be taken that this is monitored and appropriate changes are made to improve the accuracy of identifying malicious traffic.
Sometimes there is no malicious traffic, just the rate of good traffic is bursty. It is necessary to ensure that rate limiting is able to handle bursty traffic, which may be through having appropriate buffers or queuing to smoothen the bursts. Bursty traffic can give an impression of very high rate even when the rate isn’t too high.
One of the difficult aspects of rate limiting is getting the limits right. There are several factors that externally determine the rate of traffic - eg: time of the day, the bandwidth availability, the type of service, the number of users etc. If you set the limits too high, it may result in the system not being able to handle the traffic, and may bring the service down. However, if the rate limits are too low, it may result in denial of service.
Artificial Intelligence and dynamic rate limiting is necessary to ensure that rate limit settings are appropriately anticipated and applied.
There are some key considerations that are fundamental to designing a rate limiting system, the most important being understanding the service, system and expectations of users when designing a rate limiting system.
Understanding how popular services implement rate-limits may serve as a good guide in implementing it and learn from best practices. EnRoute has a flexible and highly configurable rate-limit engine, which is baked into the system. EnRoute can program Envoy's rate limit API and is also integrated with Lyft's rate-limit service through gRPC. No additional configuration is required to enable the service. The rate limit service just works along with other functionality provided by EnRoute, including authentication and authorization.
More information about EnRoute’s rate limit engine and real world rate limit examples can be found in another article - “Why Every API Needs a Clock”