Understanding Concurrency And Throttling In AWS Lambda Functions: Scaling Up For Serverless Efficiency
If you are writing Lambda functions for production, it is important to think about how your function scales and how performance can be affected by usage.
Yeah, I know. You might probably be thinking that being a "serverless" service, you don't have to worry about provisioning and scaling when it comes to AWS Lambda functions. Sorry to burst your bubble, you definitely have to think of scaling when deploying production-level Lambda functions.
Concurrency in AWS Lambda is the number of requests that your function can process simultaneously.
Generally, new containers are spawn up each time your lambda function is invoked. So, for every new invocation to your lambda function, new resources are created behind the scenes to handle these requests. If more than one invocation is made to your function at any given point in time, new containers are being spawn up to handle these "Concurrent" requests.
Concurrency is a fundamental concept in AWS Lambda, the serverless compute service offered by Amazon Web Services. It refers to the number of Lambda function executions that can be handled simultaneously. This directly impacts how your serverless applications respond to incoming requests and scale to meet demand.
In essence, each concurrent execution of a Lambda function runs within a separate instance of the Lambda execution environment. These environments are ephemeral containers pre-provisioned with the necessary resources to run your function code. When a Lambda function is invoked, Lambda allocates an execution environment to process the request.
By default, Lambda operates on a shared concurrency model. This means that all your Lambda functions in a specific region share a pool of concurrency units (typically 1000 per region). As your functions receive more requests, Lambda automatically scales the number of execution environments provisioned to handle them, up to the limit of available concurrency.
Concurrency is a very important aspect to consider, because they can cause your applications to fail due to a concept called Throttling.
By default, there are 1000 units of concurrency allocated to each AWS account per region. This capacity is shared by all the functions in your account, per region. What this implies is that no function in your account can be invoked simultaneously by a rate of more than 1000 times. And if more functions are being invoked at the same time, they will share from the 1000 available units.
Of course, this is a soft limit and can be raised by filing a support ticket with AWS.
Throttling in AWS Lambda occurs when function invocations exceed the amount of available concurrency units. This causes your function not to run and you get a RateExceeded execption.
Lambda concurrency can be broadly divided into three: Unreserved concurrency, reserved concurrency and provisioned concurrency.
This is the default category where your functions belong to. In Unreserved concurrency, all lambda functions in your account will share from a common concurrency pool. Let's say you have three functions in your account. If function A is getting 900 simultaneous invocations and functions B is getting 100 invocations at a time, function C is essentially going to get throttled and cannot be invoked since the maximum concurrency units allocate to your account are being used up by Functions A and B.
Here's a breakdown of how it works:
Unreserved concurrency is suitable for functions that:
In essence, unreserved concurrency offers a pay-as-you-go approach for Lambda function executions, making it a good choice for functions with variable workloads and non-critical tasks.
This is how you provide guaranteed concurrency to a lambda function. Remember that with unreserved concurrency, if one function exceeds the concurrency limit, other functions in your account will be throttled by the Lambda service. With reserved concurrency, you deduct some concurrency units from the overall capacity and allocate them to one function. This provides the function with exclusive access to the reserved units.
For instance, if we have functions A, B and C and we provide function A with 200 reserved concurrency units. We now have 800 concurrency units left for functions B and C to share. Meanwhile function A will have exclusive access to 200 concurrency units. Keep in mind that with reserved concurrency, your function can't casually go above its allotted capacity. In our example above, if function A gets invoked more than 200 times at any instant, further invocations will get a RateExceeded exception (Throttling).
Reserved concurrency guarantees a set number of concurrent executions for your function. It carves out a portion of the shared concurrency pool in a specific region for your function's exclusive use. This translates to a few key benefits:
Here's a scenario where reserved concurrency proves valuable:
Imagine a critical function that processes financial transactions. You want to ensure this function experiences minimal latency and can handle a surge in requests without impacting performance. By setting reserved concurrency for this function, you guarantee it has dedicated resources to handle incoming transactions promptly, even during peak loads.
In summary, reserved concurrency in AWS Lambda offers a way to secure prioritized execution, mitigate cold starts, and control resource allocation for your serverless functions.
This is a newer category that mostly deals with latency issues and helps prevent the problem of cold starts by providing dedicated execution environments. In essence, provisioned concurrency is more like reserved concurrency except that it provisions a specified number of execution environments - equal to the specified concurrency units. Hence, with an increase in invocations, your clients do not experience any latency as the execution environments have been provisioned before hand - essentailly eliminating cold start issues.
Provisioned concurrency is a technique for ensuring low-latency execution of your code. It tackles the challenge of cold starts, which can significantly slow down the initial execution of a function.
Here's a breakdown of how it works:
The benefits are significant. When a request arrives, it can be routed to a pre-provisioned environment, eliminating the cold start delay. This translates to much faster response times, typically in double-digit milliseconds, for your functions.
There's a trade-off to consider however. While provisioned concurrency brings performance benefits, it comes at an additional cost. You're billed for the duration these pre-warmed environments are running, regardless of whether they handle any requests.
Here are some use cases where provisioned concurrency shines:
Several strategies can help you optimize concurrency and prevent throttling in your Lambda functions:
Overall, provisioned concurrency is a powerful tool for enhancing the performance of serverless applications when low latency is paramount. But, it's essential to weigh the performance gains against the added cost before implementing it.
By understanding concurrency and implementing these optimization techniques, you can create serverless applications on AWS Lambda that are scalable, cost-efficient, and resilient to varying workloads. To read about advanced AWS Lambda features, click here.
Happy Clouding !!!
Did you like this post?
If you did, please buy me coffee 😊
No comments yet.