Articles > Understanding Concurrency And Throttling In AWS Lambda Functions: Scaling Up For Serverless Efficiency

Understanding Concurrency And Throttling In AWS Lambda Functions: Scaling Up For Serverless Efficiency

Kelvin Onuchukwu

May 23, 2024

If you are writing Lambda functions for production, it is important to think about how your function scales and how performance can be affected by usage.
Yeah, I know. You might probably be thinking that being a "serverless" service, you don't have to worry about provisioning and scaling when it comes to AWS Lambda functions. Sorry to burst your bubble, you definitely have to think of scaling when deploying production-level Lambda functions.

Concurrency in AWS Lambda is the number of requests that your function can process simultaneously.
Generally, new containers are spawn up each time your lambda function is invoked. So, for every new invocation to your lambda function, new resources are created behind the scenes to handle these requests. If more than one invocation is made to your function at any given point in time, new containers are being spawn up to handle these "Concurrent" requests.

Concurrency is a fundamental concept in AWS Lambda, the serverless compute service offered by Amazon Web Services. It refers to the number of Lambda function executions that can be handled simultaneously. This directly impacts how your serverless applications respond to incoming requests and scale to meet demand.

Understanding Concurrency in Lambda

In essence, each concurrent execution of a Lambda function runs within a separate instance of the Lambda execution environment. These environments are ephemeral containers pre-provisioned with the necessary resources to run your function code. When a Lambda function is invoked, Lambda allocates an execution environment to process the request.
By default, Lambda operates on a shared concurrency model. This means that all your Lambda functions in a specific region share a pool of concurrency units (typically 1000 per region). As your functions receive more requests, Lambda automatically scales the number of execution environments provisioned to handle them, up to the limit of available concurrency.
Concurrency is a very important aspect to consider, because they can cause your applications to fail due to a concept called Throttling.

By default, there are 1000 units of concurrency allocated to each AWS account per region. This capacity is shared by all the functions in your account, per region. What this implies is that no function in your account can be invoked simultaneously by a rate of more than 1000 times. And if more functions are being invoked at the same time, they will share from the 1000 available units.
Of course, this is a soft limit and can be raised by filing a support ticket with AWS.

Throttling in AWS Lambda occurs when function invocations exceed the amount of available concurrency units. This causes your function not to run and you get a RateExceeded execption.

Benefits of concurrency

Automatic Scaling: Lambda seamlessly scales concurrency in response to traffic surges, eliminating the need for manual server provisioning or management.
Cost-Effectiveness: You only pay for the compute time your Lambda functions utilize, making it an economical approach for applications with variable workloads.
High Availability: By provisioning multiple execution environments, Lambda inherently enhances the availability of your serverless applications.

Lambda concurrency can be broadly divided into three: Unreserved concurrency, reserved concurrency and provisioned concurrency.

Unreserved concurrency

This is the default category where your functions belong to. In Unreserved concurrency, all lambda functions in your account will share from a common concurrency pool. Let's say you have three functions in your account. If function A is getting 900 simultaneous invocations and functions B is getting 100 invocations at a time, function C is essentially going to get throttled and cannot be invoked since the maximum concurrency units allocate to your account are being used up by Functions A and B.

Here's a breakdown of how it works:

Shared Pool: A limited number of concurrent executions are available for all Lambda functions in your account. This pool is dynamically allocated based on regional capacity.
First-come, first-served: When a function receives an invocation request, Lambda attempts to use an available concurrency slot from the shared pool. Requests are processed on a first-come, first-served basis.
Potential throttling: If the shared pool is exhausted when a new request arrives, Lambda might throttle the function. This means the request is queued and execution is deferred until a slot becomes available.
Cost-effective: Unreserved concurrency is the most cost-effective option as you only pay for the actual executions your functions incur.

Unreserved concurrency is suitable for functions that:

Experience unpredictable traffic patterns: If your function's workload varies significantly, unreserved concurrency scales automatically without incurring the cost of pre-provisioned resources.
Are tolerant of occasional latency spikes: As functions compete for shared resources, occasional delays might occur due to throttling. If low latency is not critical for your function, unreserved concurrency can be a viable option.
Are non-critical: For non-critical functions that don't demand guaranteed performance, unreserved concurrency provides a cost-efficient way to handle sporadic workloads.

In essence, unreserved concurrency offers a pay-as-you-go approach for Lambda function executions, making it a good choice for functions with variable workloads and non-critical tasks.

Reserved Concurrency

This is how you provide guaranteed concurrency to a lambda function. Remember that with unreserved concurrency, if one function exceeds the concurrency limit, other functions in your account will be throttled by the Lambda service. With reserved concurrency, you deduct some concurrency units from the overall capacity and allocate them to one function. This provides the function with exclusive access to the reserved units.

For instance, if we have functions A, B and C and we provide function A with 200 reserved concurrency units. We now have 800 concurrency units left for functions B and C to share. Meanwhile function A will have exclusive access to 200 concurrency units. Keep in mind that with reserved concurrency, your function can't casually go above its allotted capacity. In our example above, if function A gets invoked more than 200 times at any instant, further invocations will get a RateExceeded exception (Throttling).

Reserved concurrency guarantees a set number of concurrent executions for your function. It carves out a portion of the shared concurrency pool in a specific region for your function's exclusive use. This translates to a few key benefits:

Prioritized execution: When your function receives an invocation request, Lambda prioritizes using a reserved slot if one is available. This ensures your function can start processing the request promptly, even if the overall concurrency limit is nearing capacity.
Reduced cold starts: By reserving concurrency, you lessen the likelihood of encountering cold starts, which can significantly slow down initial function executions. With reserved slots available, Lambda can reuse pre-initialized environments, eliminating the need to create and set up new ones from scratch.
Controlled resource allocation: Reserved concurrency acts as a cap on the number of functions that can run concurrently. This can be useful for managing costs or preventing your function from consuming excessive resources.

Here's a scenario where reserved concurrency proves valuable:
Imagine a critical function that processes financial transactions. You want to ensure this function experiences minimal latency and can handle a surge in requests without impacting performance. By setting reserved concurrency for this function, you guarantee it has dedicated resources to handle incoming transactions promptly, even during peak loads.
In summary, reserved concurrency in AWS Lambda offers a way to secure prioritized execution, mitigate cold starts, and control resource allocation for your serverless functions.

Provisioned Concurrency:

This is a newer category that mostly deals with latency issues and helps prevent the problem of cold starts by providing dedicated execution environments. In essence, provisioned concurrency is more like reserved concurrency except that it provisions a specified number of execution environments - equal to the specified concurrency units. Hence, with an increase in invocations, your clients do not experience any latency as the execution environments have been provisioned before hand - essentailly eliminating cold start issues.

Provisioned concurrency is a technique for ensuring low-latency execution of your code. It tackles the challenge of cold starts, which can significantly slow down the initial execution of a function.
Here's a breakdown of how it works:

When a Lambda function is invoked for the first time, it experiences a cold start. This means the execution environment needs to be set up, including downloading the code and initializing libraries. This can lead to a noticeable delay in response time.
Provisioned concurrency addresses this by pre-warming execution environments. You specify the number of environments you want to keep ready at all times. These environments are initialized and stay prepared to handle incoming requests.

The benefits are significant. When a request arrives, it can be routed to a pre-provisioned environment, eliminating the cold start delay. This translates to much faster response times, typically in double-digit milliseconds, for your functions.
There's a trade-off to consider however. While provisioned concurrency brings performance benefits, it comes at an additional cost. You're billed for the duration these pre-warmed environments are running, regardless of whether they handle any requests.
Here are some use cases where provisioned concurrency shines:

Interactive services: Web and mobile backends that require a snappy user experience can leverage provisioned concurrency to guarantee low latency.
Latency-sensitive microservices: In microservices architectures, where functions interact frequently, fast response times are crucial. Provisioned concurrency can ensure smooth communication between services.
Synchronous APIs: APIs that need to return responses immediately benefit from the low latency that provisioned concurrency offers.

Optimizing Concurrency for Your Lambda Functions

Several strategies can help you optimize concurrency and prevent throttling in your Lambda functions:

Code Optimization: By streamlining your function code and minimizing execution time, you can enable more concurrent executions within the available concurrency quota.
Batching: Process multiple units of work in a single Lambda invocation. This reduces the overall number of concurrent executions required to handle the same amount of work.
Asynchronous Invocation: For non-critical tasks, consider asynchronous invocation patterns using services like Amazon SQS or SNS. This offloads tasks without blocking the primary execution flow and reducing concurrency needs.

Overall, provisioned concurrency is a powerful tool for enhancing the performance of serverless applications when low latency is paramount. But, it's essential to weigh the performance gains against the added cost before implementing it.

By understanding concurrency and implementing these optimization techniques, you can create serverless applications on AWS Lambda that are scalable, cost-efficient, and resilient to varying workloads. To read about advanced AWS Lambda features, click here.

Happy Clouding !!!

Did you like this post?

If you did, please buy me coffee 😊

Questions & Answers

No comments yet.

Check out other posts under the same category

Basics

Serverless

Check out other related posts