Leveraging Latest Intel® Xeon® Scalable processor instances on Amazon EC2 for Serverless Workloads
Introduction
Serverless computing provides backend services on an as-needed basis for applications. Serverless is a development model that enables developers to build and run applications without having to manage compute infrastructure. Cloud providers such as Amazon Web Services (AWS) spin up and provision the required computing resources on demand when the code executes and spins it back down when execution stops. AWS Lambda, is an event-driven, serverless computing platform provided by Amazon through AWS.
In a serverless architecture, applications are launched on an as-needed basis. The user pays only for the duration when the code executes, not for any idle time. Developers are absolved of routine tasks such as managing the operating system, storage, security, capacity planning and monitoring, as these are the responsibility of the cloud services provider.
Intel and Aible Benchmark Study Observations: [i]
Aible transforms how companies make strategic decisions, act optimally, react to changes, and align across the organization using AI as an enabler for collaboration at scale.The team behind Aible has collectively implemented thousands of successful AI projects over two decades across a wide variety of customers and industry segments. Partnering with Intel, Abile offered 25 organizations access to the AI solution via the Intel and Aible Immediate Impact Program. Each of the participating organizations, which include Fortune 100 firms, defined one business objective they wished to analyze and improve, and published the value they identified from this project directly to Intel.[SJ1]
Organizations like Aible can take advantage of newer Intel technology to improve workload performance and further optimize their applications on serverless workloads. With server-oriented architectures, up to 70% of time and costs are tied up in infrastructure overhead, including resources for cluster scale-out, VM launch, establishing network connections, copying data and other latencies associated with managing the operation and costs of server infrastructure.
The study also demonstrated a better experience on serverless computing compared to traditional server architectures with comparable Intel processors.
Realized Results:
- 2–3x more cost effective[AS2] [MP3]
- 3–4x lower Total Cost of Ownership (TCO)
- 2–3x faster than on server architecture
This collaboration between Intel and Aible running different workloads on AWS serverless architecture clearly demonstrates the benefits of implementing a serverless solution. The cost savings realized by leveraging a serverless infrastructure is shown in figure 1. Though the server costs are higher, the variety of instances available to run workloads abounds. The Amazon Lambda platform hides the actual hardware available to run workloads and does not necessarily provide the best performing hardware to meet application requirements. If the serverless platform was able to run on newer Intel processors, [SJ4] it could offer customers the variety available in EC2 and provide a lower cost. Additional findings of the study indicated that AWS Lambda performance can be significantly improved by using newer generation Intel processors. This solution was motivated by the Aible study and seeks to explore the impact of newer Intel processors on serverless workloads.
Solution Overview
The Aible study showed the tremendous TCO advantages of serverless over traditional server-based compute. In this solution, we look at serverless applications and their relative performance on newer Intel processor-based instances in the Amazon cloud. We look to compare running two different workloads on AWS Lambda and on a Firecracker based serverless deployment running on the latest Intel 3rd generation Xeon Scalable Processors. [ii][SJ5] The goal is to demonstrate the advantages of leveraging newer Intel processors for serverless workloads.
Solution Components:
AWS Lambda
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of Amazon Web Services. It is a computing service that runs code in response to events and automatically manages the computing resources required by that code. AWS offers technologies for running code, managing data, and integrating applications, all without managing servers. Serverless technologies feature automatic scaling, built-in high availability, and a pay-for-use billing model to increase agility and optimize costs. These technologies also eliminate infrastructure management tasks like capacity provisioning and patching, to allow developers to focus on writing code that serves their organization.
Firecracker Open Source:
Firecracker is an open-source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services.[iii] Firecracker enables you to deploy workloads in lightweight virtual machines, called microVMs, which provide enhanced security and workload isolation over traditional VMs, while enabling the speed and resource efficiency of containers. Firecracker was developed at Amazon Web Services to improve the customer experience of services like AWS Lambda and Fargate. Firecracker open source will be used to emulate the AWS Lambda service on 3rd Gen Intel® Xeon® Scalable processor based instances
3rd Gen Intel® Xeon® Scalable processors:
3rd Gen Intel® Xeon® Scalable processors are optimized for cloud, enterprise, HPC, network, security, and IoT workloads with 8 to 40 powerful cores and a wide range of frequency, feature, and power levels. The Intel® Xeon®Scalable processors are designed to move faster, store more and process a wide range of workloads. They provide significant performance improvements over the previous generation of processors. The L1/L2/L3 cache for each of the Ice Lake cores is 48 KB, 1.25 MB, and 1.5 MB, respectively compared to the previous generations cache of 32 KB/1 MB/1.375 MB. 3rd Gen Intel® Xeon® Scalable processors also feature two more memory channels, support for PCIe 4.0 with 16 more lanes than prior generations, with increased memory speeds available with DDR4 3200 DIMMs.
Amazon EC2 C6i instances are powered by 3rd Gen Intel® Xeon® Scalable processors (code named Ice Lake) with an all-core turbo frequency of 3.5 GHz and include support for always-on memory encryption using Intel® Total Memory Encryption (Intel® TME). C6i instances also support new Intel® Advanced Vector Extensions (Intel® AVX-512) instructions for faster execution of cryptographic algorithms. C6i are also available with local NVMe-based SSD block-level storage (C6id instances) for applications that need high-speed, low-latency local storage.
The ubiquity of Intel processors and their constant evolution make them an ideal platform to run the wide variety multitude of serverless applications. The picture shows the improvements across a wide spectrum of modern workloads for 3rd Gen Intel® Xeon® Scalable processors over its predecessor.
Solution Infrastructure:
vHive is an Open-source framework for serverless experimentation. vHive implementation with an underlying firecracker MicroVM was leveraged for this project. [iv]
AWS C6i metal instances were used for testing serverless workloads on 3rd Gen Intel® Xeon® Scalable processors. Metal instances were required to run Firecracker based MicroVMs.
The AWS Lambda service was used to run the same workloads with a similar memory profile to compare job latencies
Two different serverless applications (Convolutional Neural Network (CNN) inference and HTML rendering were used to compare the serverless platforms running with Firecracker on Intel® Xeon® Scalable processors with AWS Lambda.
Testing & Results:
The vHive framework was used for testing workloads in the serverless platforms. Two types of jobs used in the solution were chosen from FunctionBench.[v] As it is a comparison of CPU families and their capabilities, the two workloads chosen were classified as CPU intensive workloads. The workloads that were run are
(1) CNN Serving: The Convolutional Neural Network (CNN) is a type of Deep Learning and Neural Networks that is primarily used for applications in image and speech recognition. The primary function of the workload we tested is to do image classification with CNN.
(2) Chameleon: The application renders a template using the Chameleon module in Python PIP library to create an HTML table of N rows and M columns that are provided as input arguments. Another web-related application is JSON serialize-deserialize module. The application performs JSON deserialization using a JSON-encoded string dataset (Awesome JSON Dataset) downloaded from a public object storage service, and it serializes the JSON object again.
The c6i metal and c5 metal instances were deployed with Ubuntu 20.04. The Firecracker open-source environment was setup after the KVM hypervisor was installed in the server. The two types of jobs were intermingled and invoked through automation for service with Firecracker on AWS c6i metal instance. The servers were kept more than 80% busy during the runs to emulate a production serverless environment.
The average latency (in microseconds) for the three different serverless HW platforms and for the different types of jobs are shown in Table 1.
The results show that both types of workloads run much better on the serverless platform hosted on c5 and c6i instances as compared to AWS Lambda. The job completion times are 4–5x faster on the newer Intel Xeon Scalable 2nd and 3rdgeneration processors [SJ8] in relationship to Lambda.
Conclusion:
Multiple Open source serverless frameworks are available to deployed on metal instances on AWS. We used Firecracker in this solution to make it a more equitable comparison as Lambda runs on it. From an efficiency and TCO perspective, the Intel Aible study showed the advantages offered by the serverless platform over traditional servers to run applications. The biggest difference between the AWS Lambda (Serverless) and the traditional EC2 server offerings is the variety of instances available. EC2 offers the latest Intel instances, while AWS Lambda in its current form is a homogenous offering with the backend compute capabilities hidden to the users. The difference in performance seen in our solution is primarily due to EC2 based instances exposing the HW acceleration capabilities of the 2nd and 3rd Gen Intel®Xeon® Scalable processors, which is not exposed by AWS Lambda.
The findings of this solution make a case for multiple tiers of services like AWS Lambda with the customers being able to avail HW [SJ9] acceleration capabilities of the underlying hardware. The fact that the two workloads compared provides 4–5x performance gains when the HW capabilities are exposed could be very beneficial for AWS serverless customers. The jobs will run faster and result in lower cost, if a tiered Serverless solution were available. By having multiple tiers of Serverless, there would be more parity with EC2 and better adoption of serverless.
Summary:
• The 3rd Gen Intel®Xeon® Scalable processors provide performance enhancements for a wide range of workloads.
• Serverless platform can provide major cost advantages over traditional servers
• Firecracker open source can be leveraged to run serverless independently of AWS 3rd Gen Intel®Xeon® Scalable Bare Metal instances
• HTML rendering and AI Inferencing were run on AWS Lambda and on Firecracker running on AWS C6i Bare Metal instances
• The results clearly show serverless applications running on newer Intel instances provide 4–5x performance gains over Lambda
Disclosure text:
The work on this paper was sponsored by Intel. Tests performed in August-September 2022 on AWS in region us-east-1. Ubuntu 20.04 LTS with kernel 5.13.0–1029-awsAMI ID : ami-08d4ac5b634553e16. All configurations used SSD storage 450 IOPS and 250 MiB/s storage throughput (gp2 SSD type) Intel instance configurations are c6i.metal: 128 vCPUs, 256GB memory, 50 Gbps network BW, Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz (CPU max MHz: 3500). We disabled 124 vCPUs and used just 4vCPUs to run our tests so that we can stress the system with loads. c5.metal: 96 vCPUs, 192GB memory, 25 Gbps network BW, Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz (CPU max MHz:3900). We disabled 92 vCPUs and used just 4vCPUs to run our tests so that we can stress the system with loads. Other software included apt installs for: bridge-utils cpu-checker libvirt-clients libvirt-daemon qemu qemu-kvm, Firecracker v1.1.1 (release-v1.1.1-x86_64), docker.io, vHive serverless framework (based it on Firecracker MicroVMs). This was setup as a single node cluster setup.URL: https://github.com/vhive-serverless/vHive
Disclaimer text:
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.
Bibliography[AS10] [MP11]
[i] Intel & Aible Study on AWS: https://enaible.aible.com/intel_benchmark
[ii] Firecracker Open Source. (n.d). https://firecracker-microvm.github.io/
[iii] Intel Generation Xeon Scalable Processors. https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html
[iv] FunctionBench: https://github.com/ddps-lab/serverless-faas-workbench
[v] vHive Framework for Serverless experimentation. https://github.com/ease-lab/vhive
Appendix A: Function code used for testing:
Chameleon code:
vHive:
See server.py at:
https://github.com/ease-lab/vhive/tree/main/function-images/chameleon
Lambda:
See lambda_function.py at
https://github.com/adayaru/serverless/blob/main/chameleon/lambda_function.py
cnn_serving code
vHive:
See server.py at:
https://github.com/ease-lab/vhive/tree/main/function-images/cnn_serving
Lambda:
See lambda_function.py at
https://github.com/adayaru/serverless/blob/main/cnn_serving/app/lambda_function.py