The Drive for Efficient AI Computing

The Drive for Efficient AI ComputingWhether it is in the datacenter or across devices at the edge, the ongoing evolution of AI is being increasingly defined by a move to more efficient computing. This is important for two main reasons:

Ensuring that large-scale AI workloads are processed as efficiently as possible to save energy and costs.

Enabling, when necessary and appropriate, the move to AI at the edge, so AI workloads are processed on the device rather than in the cloud. In relevant cases, this helps to put computing close to where the data is captured for quicker, more secure AI experiences for the end-user.

The drive for more efficient AI computing is taking place on the CPU. Arm’s CPUs are ideal for a wide variety of AI-workloads, but for some use cases they can benefit from Arm or partner accelerators to complement the CPU. Arm's GPUs (Immortalis and Mali) and NPUs (Ethos-U) can seamlessly complement our CPUs to achieve accelerated AI. Meanwhile, from a partner perspective, Arm CPUs are the best combination with partner AI accelerators in AI-based computing solutions.Arm's processors continue to evolve and improve over time with greater performance and efficiency capabilities. For example, Arm’s CPU performance improvements have doubled AI processing capabilities every two years in the past decade. Arm is also focused on relentless architecture innovation, with our latest Armv9 architecture features (Scalable Vector Extension 2 (SVE2) and Scalable Matrix Extension (SME)) enabling our ecoystem to reach higher performance with reduced power consumption across their AI-based solutions and applications. Here is a closer look at how Arm is efficiently enabling AI workloads across key technology markets and devices, and how this is defining future AI-based technology experiences and trends.

The DatacenterEnormous cloud datacenters process most of the world's big AI workloads today because they can do so more efficiently at scale. But as datacenters expand, and as AI workloads take up a notoriously large percentage of their computing cycles, concerns about energy consumption have increased. This is compounded by the increased use of LLMs, which have huge computing requirements. Today’s datacenters already consume signifciant power. Globally 460 terawatt-hours (TWh) of electricity are needed annually. That is the equivalent to the entire country of Germany. Future AI models continue to become larger and smarter, fueling the need for more compute, which increases demand for power as part of a cycle. Finding ways to reduce the power requirements for these large datacenters is paramount to achieving the societal breakthroughs and realizing the AI promise. Arm is on a mission to help tackle AI’s insatiable energy needs, with our power-efficiency DNA allowing the industry to rethink how chips are built to accommodate the growing demands of AI. Arm’s latest Neoverse CPU is the most high-performant, power-efficient processor for cloud datacenters, versus the competition. Neoverse offers hyperscalers the flexibility to customize their silicon to optimize for demanding workloads, all while delivering leading performance and energy efficiency.

Case Study: NVIDIA's GH200 Grace Hopper Superchip

NVIDIA's GH200 Grace Hopper Superchip is a breakthrough processor designed for giant-scale AI and high performance computing (HPC) applications. SoftBank is building one of the world's first AI datacenters with Grace Hopper.

The GH200 Grace Hopper Superchip is built on the NVIDIA Grace CPU, which is powered by two 72-core Arm Neoverse CPUs. This provides a 10x performance leap for AI's most demanding tasks, while dramatically improving performance-per-watt.

The Arm-based NVIDIA Grace CPU promises to be the foundation of next-generation datacenters and can be configured for a variety of AI workload needs, from training to inference.

This is why Amazon, Microsoft, Google, and Oracle have now all adopted Arm Neoverse technology to solve both general-purpose compute and CPU-based AI inference and training.

AWS Arm-based Graviton: 25 percent faster performance for Amazon Sagemaker for AI inference, 30 percent faster for web applications, 40 percent faster for databases, and 60 percent more efficient than competition.

Google Cloud Arm-based Axion: 50 percent more performance and 60 percent better energy efficiency compared to legacy competition architectures, powering CPU-based AI inference and training, YouTube, Google Earth, among others.

Microsoft Azure Arm-based Cobalt: 40 percent performance improvement over the competition, powering services such as Microsoft Teams and coupling with Maia accelerators to drive Azure’s end-to-end AI architecture.

Oracle Cloud Arm-based Ampere Altra Max: 2.5 times more performance per rack of servers at 2.8 times less power versus traditional competition and being used for generative AI inference models – summarization, tokenization of data for LLM training, and batched inference use cases.

Large-scale AI training also requires unique accelerated computing architectures, like the NVIDIA Grace Blackwell platform (GB200), which combines the NVIDIA Blackwell GPU architecture with the Arm-based Grace CPU. This Arm-based computing architecture enables system-level design optimizations that reduce energy consumption by 25x and provide a 30x increase in performance per GPU compared to NVIDIA H100 GPUs using competitive architectures for LLMs. These optimizations, which deliver game-changing performance and power savings, are only possible thanks to the unprecedented flexibility for silicon customization that Arm Neoverse enables. It is emerging as the foundation for the world’s AI aspirations, bringing together technology leadership, flexibiliity in silicon designs and an industry-leading ecosystem in a way that no other company on the planet can. Learn more about how Arm is enabling AI infrastructure here.Network InfrastructureThe rise of AI applications across the telecommunications industry demands ever-increasing compute power and data bandwidth. However, AI also holds immense potential to optimize RAN network efficiency and unlock new revenue streams for the telecommunications industry.

To this end, Arm is collaborating with industry leaders across the entire technology landscape – from silicon manufacturers and cloud providers to operators and universities – to develop AI-powered solutions for next-generation networks. Arm also joined the AI-RAN Alliance, a new initiative aimed at integrating AI into cellular technology to further advance RAN technology and mobile networks. This brings together tech industry leaders with a broad range of experience across silicon and software.Learn more about how Arm is enabling AI innovation in network infrastructure here.Consumer Tech MobileWe are already seeing LLMs for generative AI running on mobile devices delivering more advanced AI workloads, like chat-based virtual assistants, that go beyond today's common AI-based experiences, like keyboard and object detection and speech recognition. AI is also set to supercharge mobile graphics, as explained in this guide. The Arm CPU is central to AI across consumer technology, with 99 percent of the world’s smartphones used by billions of people worldwide being built on Arm. This has led to 70 percent of AI in today’s third-party applications running on Arm CPUs, including the latest social, health, and camera-based applications, and many more.High-performance AI-enabled smartphones are now on the market, which are built on Arm’s v9 CPU and GPU technologies. These include the new MediaTek Dimensity 9300-powered vivo X100 and X100 Pro, Samsung Galaxy S24, and Google Pixel 8 smartphones. The combination of performance and efficiency provided by these flagship mobile devices is delivering unprecedented opportunities for AI innovation. Arm’s robust technology roadmap will see yet more AI performance, technologies, and features. This will be supported by the rise of AI inference at the edge, with CPUs being best placed to serve this need as more AI support and specialized instructions continue to be added to the Armv9 architecture.

Video: Arm's virtual assistant demo that utilizes Meta's LLAMA2-7B LLM on mobile

Case Study: Generative AI at the Edge on Mobile

Arm has developed a virtual assistant demo that utilizes Meta’s LLAMA2-7B LLM on mobile via a chat-based application.

The generative AI workloads take place entirely at the edge on the mobile device on the Arm CPU, with no involvement from accelerators. The impressive performance is enabled through a combination of existing CPU instructions for AI, alongside dedicated software optimizations for LLMs through the ubiquitous Arm compute platform that includes the Arm AI software libraries.

As you can see from the video on the next page, there is a very impressive time-to-first token response performance and a text generation rate of just under 10 tokens per second that is faster than the average human reading speed.

This is made possible by highly optimized CPU routines in the software library developed by the Arm engineering team that improves time-to-first token by 50 percent and text generation by 20 percent, compared to the native implementation in the LLAMA2-7B LLM.

The PCThe PC market has seen the rapid expansion and availability of AI-based applications and features to improve a range of productivity and creativity tasks. Today, there are AI-powered smart assistants that enhance PC interactions, generative AI that inspires content creators with new ideas, and advanced AI workloads that transform the imaging pipeline – a fundamental feature for PC video conferencing. This is laying the foundation for the future growth of the AI PC, which Arm defines as a PC that is optimized for modern AI workloads.Over the next year, Arm expects an exponential increase in AI-based applications, features and innovations for the PC market. This includes more advanced AI use cases, like enabling the latest LLMs to run more efficiently and faster within the memory bandwidth of modern PC designs. Today’s Windows on Arm (WoA) laptops provides an exceptional foundation for the AI PC, with AI workloads running best on Arm-based technologies and CPUs at the heart of these devices. Meanwhile, Arm’s own AI ecosystem combines hardware, software and thousands of diverse partners to accelerate development and bring the latest AI-based features to life. WoA laptops are already being used for a broad range of advanced AI workloads, which include the latest AI-based use cases and applications on the PC, like Microsoft Copilot.Learn more about the capabilities of WoA in this guide.The Smart HomeIn the age of AI, the whole smart home is effectively becoming a smart assistant. Today’s AI-powered virtual assistants respond to a person's voice to perform an action when prompted. However, future virtual assistants will understand when a person walks into the home and then adapt the internal environment automatically based on their preferences. This could be anything from adjusting the temperature or light settings, playing music, managing access to the home, or generating automated grocery lists.The rise of AI workloads at the edge brings a whole new level of experiences and personalization for content and applications that manage the smart home environment. In fact, the TV, which sits at the center of the smart home experience, is effectively AI-powered.Today’s smart TVs use AI for picture quality enhancements, and voice command for control and content selection. Also streaming services – which play a pivotal role in today’s TV experience – use powerful machine learning (ML) algorithms to recommend relevant content to users.

Additionally, as cameras start re-appearing in TVs, there is a significant increase in AI-based smart camera use cases, such as health and fitness, gaming and video calling. Home fitness is becoming more prevalent on TVs, with applications using AI for body tracking during workouts and then making relevant recommendations to the end-user.Learn more about how AI is shaping the smart home here.IoT The IoT market is seeing significant AI innovation. Like consumer technology, Arm technology is a driving force, with 20 billion Arm-based system-on-chip (SoC) solutions capable of running a variety of AI workloads for a broad spectrum of IoT devices. These vary from embedded systems built on small, low power Arm Cortex-M processors to industrial and high performance IoT built on more performant Arm Cortex-A CPUs and Arm Ethos-U NPUs.Arm is committed to pushing the boundaries of edge AI through delivering intelligent and secure devices and systems that can empower innovation and transform lives. This is being achieved in the following ways:

Optimized hardware and software for AI in high-performance IoT that carefully balance performance, power consumption, cost-effectiveness, security, and scalability.

Streamlined tools and platforms that democratize the development and deployment of AI in high-performance IoT, empowering developers and system builders from diverse backgrounds to create and tailor solutions according to their own needs.

Robust ecosystem support and strategic partnerships that drive the adoption and maximize the impact of AI in high-performance IoT, encouraging collaboration and co-creation across various stakeholders and industries.

Learn more about the edge AI evolution taking place across IoT markets here.AutomotiveThe automotive market is undergoing an unprecedented transformation with increasing levels of autonomy, advanced in-vehicle user experiences, and further electrification. This is leading to demands for more software and AI capabilities. A range of automotive applications integrated into software-defined vehicles (SDVs) are enabled by the ongoing AI revolution. These include digital cockpit and in-vehicle infotainment (IVI) through to advanced driver-assistance systems (ADAS) and autonomous driving.While these seem like future-looking applications, it is important to remember that AI-based workloads and features are already prevalent across today's computing systems in cars. These enable fundamental AI-enabled capabilities that provide a safer driving experiences, like obstacle detection, 3D views and simple sensor fusion, which all run on the Arm CPU. The next-generation of AI-based automotive features, like ADAS and autonomous driving capabilities, are being built on Arm’s latest Automotive Enhanced (AE) IP, which include the Arm Neoverse V3AE and the first Armv9-based Cortex-A processors that are purpose-built for automotive markets. These autonomous workloads are set to define future vehicles and become pervasive across the automotive industry.

Case Study: Autonomous Driving Solutions

Arm is partnering with Nuro, one of the world's leading autonomous technology companies, to deliver scalable solutions for the autonomous vehicles of the future.

Nuro is leveraging Arm AE IP technology to deliver its Nuro Driver, an integrated automomous driving system. The Nuro Driver has broad applicability across many automotive and autonomous use cases and already been successfully integrated into seven commercial and consumer vehicle platforms.

The Arm Nuro long-term collaboration will accelerate progress toward an AI-enable autonomous future in the automotive industry, enabling a shift from prototype autonomous systems based on large servers to efficient, automotive grade, safety-certified solutions.

Moreover, AI is being used to improve and enhance the man-machine-interface (MMI) in the car through natural language processing for gesture and voice recognition when drivers enter. Generative AI can dramatically improve the accuracy of voice recognition, with this leading to greater performance and safety in the vehicle. In most cases, the computing for voice-based MMI in the car can take place on the CPU and GPU.Learn more about Arm's latest automotive technologies for AI-enabled vehicles here.Developer InnovationAI hardware innovation needs to be tightly coupled with software optimizations. This is why Arm has made the commitment to the world’s software developers to enable AI everywhere in as common a way as possible for easier, faster, more secure coding. Many AI models are already optimized to run on Arm. For the 15 million developers worldwide developing for Arm-based devices, this allows them to run complex AI workloads and ensure they get their applications to market faster. Arm CPUs are flexible and programmable, enabling developers to adopt AI and ML into their applications at pace as models continue to evolve. Arm’s extensive software libraries and tools, alongside integration with all major operating systems and AI frameworks, ensure developers can optimize without wasting valuable resources.

The Arm CPU also provides the AI developer community with opportunities to experiment with their own techniques to provide further software optimizations that make LLMs smaller, more efficient and faster. From a technology perspective, this means that more AI processing can take place at the edge. These smaller, more compact models can also run on small microprocessors or even smaller microcontrollers (MCUs), saving time and costs. For example, Plumerai, which provides software solutions for accelerating neural networks on Cortex-A and Cortex-M SoCs, runs just over 1 MB of AI code on an Arm-based MCU to perform facial detection and recognition. Keen to preserve user privacy, all AI inference is done on the chip so no facial features or other personal data are sent to the cloud for analysis.

Case Study: Optimizing AI and ML Models at the Edge on Arm

Arm works with leading ecosystem partners to enable developers to run AI and ML models at the edge.

Arm is working with NVIDIA to adapt NVIDIA TAO, a low-code open-source AI toolkit for Ethos-U NPUs, which helps to create performance-optimized vision AI models for deployment on these processors.

The NVIDIA TAO provides an easy-to-use interface for building on top of TensorFlow and PyTorch, which are leading, free, open-source AI and ML frameworks. For developers, this means easy and seamless development and deployment of their models, while also bringing more complex AI workloads to edge devices for enhanced AI-based experiences.

Meanwhile, Arm and Meta are working to bring PyTorch to Arm-based mobile and embedded platforms at the edge with ExecuTorch. ExecuTorch makes it far easier for developers to deploy state-of-the-art neural networks that are needed for advanced AI and ML workloads across mobile and edge devices. Moving forward, the ongoing collaboration between Arm and Meta ensures AI and ML models can be easily developed and deployed with PyTorch and ExecuTorch.

Security in the Age of AIAlongside its potentially transformative impact, AI presents unique security threats, with a vast amount of sensitive data being collected, held, and then used to provide highly personalized technology experiences to the end-user. The focus on security in the age of AI is driving industry and government discussions as solutions are developed to maximize AI’s benefits and minimize any potential societal impact.The gradual shift of AI at the edge brings security benefits to businesses and users. This means that sensitive user data can be handled and processed on the actual device, rather than being sent to third parties to process, allowing businesses and consumers to have more control of their data.Arm’s decades of investment into security features enable privacy-preserving compute as we aim to protect data and valuable neural network (NN) models. AI running on the CPU can benefit from architectural security features, which include those from the Armv9 architecture and Arm's foundational security technologies, alongside our industry-leading standards. These include:

Memory Tagging Extension;

Realm Management Extension, which forms the basis of the Arm Confidential Compute architecture;

Pointer Authentication (PAC) and Branch Target Identification (BTI); and

PSA Certified.

Learn more about security in the age of AI in this blog.Looking to the Future of AI AI-Based Ambient ExperiencesHow people use technologies in the future is likely to involve ambient computing, which enables a world where there is no need for conscious interaction from users as billions of devices work invisibly in the background and only come to the fore when necessary. This will be fueled by the exponential growth of AI across all technologies.Ambient computing is highly responsive and personalized to the needs, preferences, and environments of the end-user. But delivering ambient experiences requires unprecedented co-ordination between services, data, and AI-based technologies, particularly in IoT and consumer tech markets that consist of billions of sensors and devices worldwide. AI is important to these ambient experiences because it helps to collate and interpret the data, and then automatically implements the most relevant experience based on the person and the environment they are in.Sensors and smartphones are critical technologies for ambient computing, now and in the future. They provide the contextual awareness that is needed to facilitate ambient experiences, gathering information about various public or private environments and then making this information relevant to people based on their own unique preferences. While today's ambient experiences seem reasonably advanced, like smart assistants and home energy management, they are not yet truly autonomous. The expansion of compute, intelligence, and personalization enabled by AI within different contexts and environments will ultimately propel future ambient experiences to a whole new level.

There are five key use cases that Arm believes will define future AI-based ambient experiences.

Amplifying senses: Ambient computing has the potential to amplify senses by providing unique experiences that allow people to see, hear, smell and touch at levels never before realized. Essentially, this gives people a “digital sixth sense” that will be revolutionary in two key ways. It will enable people to hear sounds never heard properly before, such as a car emerging from a blind corner on the road, and have life-changing implications for those who are hard of seeing or hearing, with ambient computing amplifying their ability to see and hear.

Transforming the driving experience: Inside the vehicle, ambient computing can help to make the driving experience entirely personalized around the needs and preferences of the driver and passengers. This could be automatically detecting temperature preferences or preferred driving routes, for example.

Ambient buildings: In future buildings, the user experience will be completely automatic, providing more relevant insights to the user and those around them. For example, in the age of hybrid working, future workplaces automatically book desks or rooms for workers as soon as they walk into the building based on their role and schedule. Automated cleaning schedules can then be delivered automatically based on real-time information showing parts of buildings that are occupied or not in use.

Healthcare improvements: Future hospitals or care homes can detect physical or physiological movement, such as breathing, heart rates, and even falls among patients by using sensors installed throughout buildings. This will identify any healthcare issues early, removing the need for constant doctor, nurse, or care worker supervision, and potentially save lives. Voice-enabled AI can also automatically document conversations between physicians, patients, and families, removing the need to rely on paperwork or memory, therefore saving time and improving accuracy.

Highly personalized public spaces: For those with accessibility or neurodiversity challenges, ambient computing can adapt to public environments to improve their experiences. For example, public environments can be altered for those with autism, making them less jarring by diminishing the impact of noise and people in crowded spaces, or even pointing them toward areas with less people.

Learn more about ambient experiences in this guide.Evolving Edge ComputingIn the age of AI, there is a broad consensus across the ecosystem to do as much data processing as possible locally at the edge or a locally-based server for the billions of connected devices. However, in order to meet the demands of deploying AI at scale, edge computing must evolve. "Evolving edge computing” promotes well-designed, collaborative industry initiatives to enable continuous advancements in hardware and software heterogeneity, frictionless development experiences and security at scale. These support vital computing trends that benefit the entire technology ecosystem, including ‘cloud-like’ development approaches, modular software, software re-use, the removal of needless fragmentation and utilizing openly available standards

For example:

Modular software comes from embracing heterogeneity, with this working as seamlessly as possible across different hardware and software platforms from different vendors.

Software re-use and cloud-like development approaches are both important aspects of frictionless development, removing the need to create software from scratch and waiting for the hardware to be available, which accelerates the development process for edge computing.

Openly available standards, like Arm SystemReady, PSA Certified and PARSEC, ensure the ecosystem conforms to set standards to promote security at scale, while minimizing the layers of software that exist across individual connected devices to tackle fragmentation.

Learn more about evolving edge computing in this whitepaper.