AWS Graviton Weekly # 42
Originally published in the AWS Graviton Weekly newsletter
Issue # 42: June 16th, 2023 to June 23rd, 2023
Welcome to Issue # 42 of AWS Graviton Weekly, which will be focused on sharing everything that happened in the past week related to AWS Silicon: from June 16th, 2023 to June 23rd, 2023.
First a quick apology: I’m sending this email almost 8 hours after the regular schedule because I wanted to share more details about everything the team from AWS shared.
It wasn’t easy because it was more than 6 hours of video I needed to analyze and share the details behind it, but it was worth it.
And a quick note outside of the event? Arm just launched its Learning platform with very good stuff there. Make sure to check out its Learning Paths.
On June 21st, 2023 was the AWS Silicon Innovation Day.
So, this issue will be a little different: I will concentrate on sharing everything I learned from the event yesterday, and in the final part of the email, you will find some news and resources related to AWS Silicon, apart from the things shared outside of the event.
First, I want to highlight the key announcements from the live event on YouTube.
In that way, you will find the exact moments in a single place.
- 5:55: The event starts with Art Baudo and Martin Yip as the co-hosts of the event
- 7:19: Sean and Christopher, founders of Dough Joy Donuts
- 8:52: The conversation between David Brown (Vice President, Amazon EC2) and Ruba Borno (VP, Worldwide Channels and Alliances, AWS) about how customers and partners can take advantage of the innovation behind AWS Silicon. One of the coolest things shared by Dave here?
These capabilities begin at the foundation level with AWS Nitro system and extend further. Partners can also choose to use AWS Nitro Enclaves using the same Nitro hypervisor technology to reduce the attack surface area for highly sensitive information. Nitro Enclaves uses cryptographic attestation, so you can be sure that only the authorized code is running, all at no extra charge other than EC2 instances.
Partners like Anjuna working to create a high trust environment where data is always encrypted have built solutions to help our customers, further protect and securely process sensitive data. With the support of Anjuna’s Confidential Computing platform; customers can easily embrace AWS Nitro Enclaves and create confidential clouds in a matter of minutes by lifting and shifting their existing applications without any need for code changes
Another interesting number shared by Dave was these stats[18:38]
Furthermore, we work closely with the ISV community to ensure that our customer workloads will be supported on Graviton.
Today we are in our third generation of Graviton processors and customers have been loving them. With Graviton-based Amazon EC2 instances over 40,000 customers see great benefits like up to 40 percent better price performance, up to 20 percent lower costs, and up to 60 percent less energy used for the same workload.
How AWS Silicon is built
- [34:26] The fascinating conversation between Gary Szilagyi (VP at AWS, Annapurna Labs) and Nafea Bshara (VP, Distinguished Engineer, Annapurna Labs)
- [39:26] How AWS Silicon is actually built. Nafea shared a very interesting metaphor in order to compare a chip with a house:
If you are gonna build a chip, it’s like building a house there are 50 billion bricks to put together, 50 billion bricks and you have to follow this architecture blueprint for the house while still running the structural beam, the plumbings, the wires, the ventilation ducts, and putting the windows and the doors to communicate with the external world, except this time each brick is a 5-nanometer brick, a transistor not wider than 5-nanometer, the house itself is 80 levels high, and the wires, all the ducting and electrical wire you have to run inside the house is 35 km or 25 miles, and the footprint of the house is not bigger than an inch by an inch.
Once you do that, you still need to meet the volume demand, because of the efficiency, we need to build these chips in a few months, we need to make million of them per year, and each one should not be costing more than ten of dollars or a few hundreds of dollars. And Gary: one more thing about these chips: after you build them and deploy them, you need to deliver electricity to empower them.
Think about this: a typical house in the US, in the best case, you’ll get 100 Ampere of electrical current from a utility company. Well, the chips that we build today of the size of a quarter; they actually need 500 Ampere. So, imagine they need 5x of the house need, and not only do they need 5x, they need the 500 Amp in a few nanoseconds.
So you really need to build not just the chip; while building the chip in the system, you will need a pretty wide access road for all this electrical current and inside these chips, you will need many, many electrical wires: the 35 km of wire equivalent, and all these wires need to be low resistance so they don’t heat up, and they don’t have any open circuit.
So, designing and testing this level of complexity is what we focus on.
How Annapurna Labs develop the AI chips
- [45:39] Gary explains how is the process to build the AI chips
- [47:33] Gary explained the challenge of the fast innovation behind the Science of Machine Learning
- [49:08] Back to developing high-performance CPUs like Graviton and the hard challenges behind it
- [50:34] The advantages of developing silicon in-house for AWS
- [52:35] The importance of sustainability in the cloud
Insights from Analysts
- [58:32] Patrick Moorhead (CEO and Chief Analyst, Moor Insights and Strategy) talks about the innovation in the silicon space made by AWS and Annapurna Labs
The AWS Nitro Card Discussion
- [1:01:21] Anthony Liguori (VP/Distinguished Engineer at AWS) and Ali Saidi (Sr Principal Engineer at Annapurna Labs) discussed the foundations of AWS Nitro a decade ago and how the seamless integration between hardware and software plays a key part here
- [1:04:29] How Annapurna Labs actually build and design chips end-to-end
- [1:06:09] What it looks like to develop hardware at Annapurna Labs
- [1:08:35] Ali explains how actually Annapurna Labs is using the current generation of Graviton and Nitro chips on EC2 in order to develop its next generation
- [1:10:00] Changes in the Nitro card across generations
- [1:12:47] How to run simulations for certain high-performance numbers in the Nitro card
- [1:17:41] Exciting features like Scalable Reliable Datagram (SRD) protocol, Elastic Fabric Adapter (EFA), encryption, and more cool stuff are loved by Anthony
- [1:20:00] The importance of efficiency and sustainability
The Voice of the Customer
- [1:32:49] Tiffany Wissner (Wentzel) (Director of Product Marketing for Core Infrastructure and Modern Apps at AWS) and Jeff Barr (Vice President & Chief Evangelist at AWS) shared how AWS customers are using AWS Silicon to continue innovating
- [1:36:38] How Sprinklr moved their OpenSearch and Elastic Kubernetes Service over to Graviton-based instances in only TWO WEEKS [Case Study]
- [1:42:17] How WebBeds Reduced Costs up to 64% Using Amazon EC2 Spot Instances and AWS Graviton [Case Study]
- [1:44:48] Some tips if you want to migrate your workloads to Graviton
- [1:45:40] How Screening Eagle, Actuate, Money Forward, and Dataminr are using AWS Inferentia and AWS Trainium today [AWS:reInvent 2022 talk]
- [1:46:46] The Dataminr Case Study
- [1:49:30] Money Forward Case Study
- [1:51:58] OctoAI segment with Jason Knight (co-founder and VP of ML at OctoML)
Insights from Analysts
- [2:05:17] Another analyst perspective. This time from Elias Khnaser (Chief of Research at EK Media Group) and Raj Pai (VP of EC2 Product Management at AWS) talking about Silicon Innovation
- [2:07:08] Why AWS is building its own Arm-based processors
- [2:11:30] The Graviton “perks” for AWS customers
- [2:14:01] Is difficult to migrate your workloads to Graviton?
- [2:16:51] Average time to actually do the migration to Graviton?
It’s time for Generative AI chips stuff
- [2:32:46] Chetan Kapoor (Director of Product Management, EC2 Core at AWS) and Gadi Hutt (Senior Director of Business Development at Annapurna Labs) discussed the innovation behind AWS Inferentia and AWS Traininum chips
- [2:34:36] Building an FPGA Service as a Business from Scratch
- [2:36:18] Why it was important to start building a silicon for accelerating Machine Learning?
- [2:38:31] How inf1 is being used by external customers?
- [2:39:56] How Alexa and Alexa Voice is Powered by AWS Inferentia
- [2:41:12] Why AWS Trainium was the next challenge for the team
- [2:42:16] The main differences between AWS Trainium and AWS Inferentia chips
- [2:44:20] Let’s take a look at the AWS Trainium chip
- [2:45:07] Inferentia and Trainium chips compared side by side
- [2:46:30] How it means to deploy a supercomputer at scale on AWS datacenters
- [2:48:59] AWS is building a new cluster called “Trainium One” with more than 30,000 chips on it
- [2:49:32] The next big challenge? Generative AI Inference: inf2
- [2:50:42] The EC2 inf2 Instance Server itself: You can run a model with 175 Billion parameters there, thanks to the memory bandwidth of 10 Terabytes (WTF???)
- [2:52:53] Let’s talk about the Software part where Open Source LLMs play a key role: AWS Neuron SDK
- [2:57:07] Sairam Menon (Software Engineering Manager, AI Product Line Owner at Johnson & Johnson Technology) explains how the company is using AI today
Confidential Computing on AWS
- [3:07:03] Art Baudo, William Yap (Principal Product Manager, AWS), and Arvind Raghu (WW Business Development & GTM Strategy Lead, EC2 Core, Confidential Computing) discussed the state of Confidential Computing at AWS
- [3:10:52] What makes Confidential Computing unique from the AWS perspective?
- [3:12:54] What is the AWS Nitro system exactly?
- [3:30:49] How Skyflow is using Confidential Computing on AWS with Amruta Moktali (Chief Product Officer at Skyflow)
The second part of the event is only available today on Twitch, so let’s continue with it.
Deep Dive on AWS Graviton
- [00:05:23] Deep Dive on Graviton with Stephanie Shyu (Global Head of Graviton GTM Strategy and Business Development at Amazon Web Services for Graviton) Martin Yip and Ali Saidi
- [00:06:23] Why AWS decided to build their custom chips based on Arm?
- [00:08:09] How Epic Games and Snap are taking advantage of Graviton processors
- [00:08:58] How AWS is helping customers to save up to 40% in cloud costs thanks to Graviton
- [00:10:56] How Graviton is helping customers with their Sustainability goals
- [00:12:43] How Snowflake and Pinterest are taking advantage of it
- [00:14:05] More than 40,000 AWS customers are using Graviton today
A very interesting slide was shared here
- [00:15:03] AWS Graviton-based EC2 instances. Stephanie shared a very interesting tip here: The simplest way to start working with Graviton is by using AWS Managed Services that support Graviton instances
- [00:15:50] What types of applications AWS customers are running on Graviton?
- [00:16:25] How Wealthfront is using Graviton
- [00:16:44] How Zomato is using Graviton
- [00:17:29] What other key workloads AWS customers are running on Graviton?
- [00:17:49] Aerospike on Graviton: up to 63% better performance, up to 20% in cost savings
- [00:18:11] Snowflake on Graviton
- [00:18:20] SAP on Graviton
- [00:18:38] Emerging workloads on Graviton: The power of the new C7g instances.
- [00:19:13] Modulate AI
- [00:20:34] Stripe
- [00:21:41] Software and Languages Support for Graviton processors. The best tip here? Use Porting Advisor for Graviton
- [00:23:07] AWS Graviton Ready Program
- [00:23:26] New stuff in the Graviton space: C7gn instances, HPC7g instances
- [00:27:09] How Instructure is using Graviton
Deep Dive on AWS Trainium and AWS Inferentia
- [00:34:57] How to start working with AWS ML Silicon with Shruti Koparkar (Product Marketing and GTM leader on EC2 for ML accelerators at AWS) and Matthew McClean (Senior Manager, Solution Architect at Annapurna Labs)
- [00:36:24] What is Generative AI?
- [00:37:56] Foundational Models of Generative AI
- [00:39:24] Fine-tuning vs Pre-Training a model
- [00:39:54] Computing-Intensive Pre-Training a Model
- [00:41:08] Other capabilities in terms of scale and price-performance of AWS Trainium and Inferentia: Trn1 instances
- [00:42:52] Support for different data types and why it matters
- [00:44:39] Model Deployment
- [00:45:44] Stable Diffusion 1.5 Demo on AWS Inferentia2
- [00:50:22] Similarities between Inference and Training
- [00:55:02] UX matters for ML development at AWS
- [00:56:40] Hugging Face and AWS Collaboration, with Jeff Boudier from the Product team at Hugging Face
Deep Dive on High-Performance Computing
- [01:06:32] Art Baudo, Fei Chen (Director, HPC, and Frameworks at AWS), and Barry Bolding (Director of Advanced Computing at AWS)
- [01:07:37] A brief introduction to HPC, simulation modeling, and taking the example of designing a car from scratch
- [01:10:22] Barry shared how Arm runs 50 Million weekly simulations on AWS
- [01:10:51] Where is HPC today?
- [01:12:49] Why the last AWS Silicon matters for HPC?
- [01:15:33] How AWS customers are using HPC to drive benefits of Silicon Innovations being introduced
- [01:15:57] Aurora Innovation
- [01:17:56] DTN
- [01:19:01] Another Farming Application as an example
- [01:21:00] Emerging Trends in Silicon Innovations
- [01:21:40] Good Chemistry, Quantum Chemistry, Circular Economy, and How to Solve the PFAS Pollution Problem
- [01:23:43] Generative AI and HPC, and how Johnson & Johnson is using Inferentia and Trainium
- [01:25:40] Machine and Learning + HPC
- [01:31:47] Dr. Ian Cutress (More than Moore Analyst)
- [01:34:59] JR Rivers (Senior Principal Engineer at AWS Networking) and Madhura Kale (Senior Manager, EC2 Product Management at AWS)
- [01:36:09] Nitro Silicon, James Hamilton explaining Nitro and more
- [01:39:08] EC2 Networking timeline
- [01:40:24] Amdahl’s Balanced System Rule of Thumb
- [01:41:56] Ideal Network
- [01:42:32] Real Network
- [01:43:36] Passive Monitoring
- [01:44:37] Active Monitoring with Nitro
- [01:46:41] The Customer Challenge
- [01:49:14] Introducing Scalable Reliable Datagram (SRD)
- [01:50:51] Traffic Flow with SRD
- [01:51:25] ENA Express
- [01:53:35] ENA Express Performance
- [01:56:49] Redis — In Memory Database Reads
Silicon Innovation and the Modern Network
- [01:59:28] Peter Mckiernan (Product Marketing, AWS Networking) and Brad Casemore (Vice President, Datacenter and Multicloud Networking at IDC)
AWS Nitro SSDs and Amazon RDS
- [2:08:14] Art Baudo, Prarthana Karmakar (Sr. Manager, Software Development at Amazon Web Services), and Reena Gupta (Senior Product Manager, EC2 )
Costs Using AWS Silicon for Cost Optimization
- New Amazon EC2 C7gn Instances: Graviton3E Processors and Up To 200 Gbps Network Bandwidth
- New — Amazon EC2 Hpc7g Instances Powered by AWS Graviton3E Processors Optimized for High-Performance Computing Workloads
- AWS Makes Homegrown Arm Processor Available for Cloud Supercomputing
ARTICLES AND TUTORIALS
Optimized PyTorch 2.0 Inference with AWS Graviton processors, by Sunita Nadampalli (Software Development Manager at Amazon) and Ankith Gunapal (AI Partner Engineer at Meta (PyTorch))
The AWS Graviton Technical Guide provides a list of optimized libraries and best practices that will help you achieve cost benefits with Graviton instances across different workloads.
BTW: it’s worth the mention that this was a collaboration between AWS, Meta, Arm, and Intel. Here are some of the people who participated in this joint work:
- Ali Saidi (Sr. Principal Engineer at Amazon)
- Csaba Csoma (Sr. Manager, Software Development at Amazon)
- Ashok Bhat (Sr. Product Manager at Arm)
- Nathan Sircombe (Sr. Engineering Manager at Arm) and
- Milos Puzovic (Principal Software Engineer at Arm)
- and Geeta Chauhan (Engineering Leader, Applied AI at Meta
Graviton Essentials — Virtual Developer Day (Wednesday, July 12 2023 | 9:00 AM — 5:00 PM PDT) Live Virtual & Interactive