How AI Innovation and Wider Technology Trends are Defining Mobile Graphics and Computing
In consumer technology markets, graphics play a crucial role, particularly in gaming, entertainment, and digital media. These use cases capture the user’s attention through their visual appeal, with the quality of the graphics being absolutely critical in driving improved user experiences – evoking emotions, capturing imaginations, and enhancing product features and/or user interfaces.
The advent of the smartphone era has pushed graphics to the forefront of consumer experiences. However, the increasing prominence and importance of high-quality graphic experiences on mobile requires developers and designers to prioritize more immersive graphics to meet these increasing consumer expectations.
At the same time, the levels of realism and immersion in graphics are being profoundly influenced by the rise of artificial intelligence (AI). Machine learning (ML), which enables AI-based systems to perform tasks on data-driven learning and decision making, is also playing a crucial role in the ongoing evolution of graphics and gameplay.
These advanced computing capabilities are filtering through from desktop and PC computing to mobile, with AI and ML workloads on mobile experiencing exponential growth. AI and ML are not only helping to improve the user experience through improved visuals and higher quality gameplay, but also supporting a quicker, more creative game development process. These are having a profound impact on the continuous evolution of mobile gaming, as well as mobile computing in general.
Smartphone innovation has facilitated phenomenal growth in mobile gaming. Revenues for mobile gaming applications are expected to reach over $111 billion dollars in 2024.
AAA gaming experiences are now common on mobile, with a new generation of users selecting mobile as their gaming platform of choice. Leading AAA PC and console games, such as Genshin Impact, PUBG, Fortnite and Call of Duty, all have mobile versions.
Mobile gaming is also driving the evolution of real-time 3D technology and graphics. This plays a crucial role in adding new levels of engagement and immersion, not just in gaming applications but across social applications, too. For example, SNAP’s Snapchat applications add AR graphics through its filters and lenses to overlay graphics on top of photos. The continued growth of enhanced visual experiences through XR is likely to be deployed more widely in the future.
Arm is providing the foundational technologies, tools, and ecosystem support that facilitate this exciting future of higher-quality, more visual immersive graphics that in turn are being shaped by the rise of AI and ML on mobile.
With smartphones as the primary enabler of the world’s digital experiences and the center for technological innovations like AI, ML, and immersive 3D graphics, there are significant opportunities to impact and improve how users live, work, learn, and play through mobile experiences.
However, as part of this evolution towards more immersive, intelligent digital experiences on mobile, there is increasing demand for more complex compute-intensive workloads and performance within the mobile form factor. This brings profound computing challenges that Arm and our industry-leading ecosystem must solve.
Gaming is a key use case for mobile devices. Currently, there are 700,000 mobile games worldwide, with 71 percent of these available on Android via the Google Play Store. Many game developer companies, established or independent, seek to capitalize on the significant commercial opportunities in the mobile gaming market. But how much does it cost to develop a mobile game?
The investment in time and money from creating games is huge. It can take anything from a few months to years to develop a mobile game. For AAA gaming titles on mobile, development costs can range between $50 million to $300 million, and in some cases this already high cost can easily go higher.
On top of this, there is an ever-increasing demand for higher quality, more immersive visual experiences on mobile. This requires new levels of computing complexity, with key processing techniques and trends that can help meet this demand as the world enters a new era of visual mobile computing. These started on desktop and PC, but are now filtering through to mobile, and include the following features:
Better lighting, reflection, refraction, and shadows with ray tracing.
More geometry, higher resolutions, texture, shading, and particles as gaming scenes get more complex.
High frame rates and post processing , with 120FPS+, blur, lens, and art effects.
Arm has a broad range of techniques and features in place now, as well as a future roadmap for mobile graphics that continues to enable these processing trends. These include ray tracing, GPU-Driven pipelines, such as deferred vertex shading (DVS), variable rate shading (VRS), graphics upscaling, and, in the future, denoising ray tracing.
Adding these new features on top of the ever-increasing demand for more performance from graphics-based applications brings design complexities, particularly in the power- and area-constrained environment of battery-powered mobile devices. There is a need to provide high levels of performance, but in a power efficient way that does not drain the battery, slow down applications, or overheat the device through poor thermal efficiency.
Silicon area, power, and thermal constraints are largely fixed, due to the form factor of the smartphone. So, the challenge becomes: how to keep increasing performance on a fixed power budget?
Innovation in the graphics realm needs to be relentless. Arm already has techniques to distribute and apply visual fidelity only where it is needed, such as VRS which saves power and leads to smoother gaming experiences, or fragment prepass, removing the need for application sorting objects, which in turn delivers performance and energy benefits. To keep making substantial performance improvements within the power and area constrained environments of mobile devices, there must be even more computing innovation. This represents a significant opportunity for AI and ML to redefine how these improvements are made, especially through mobile graphics.
Many sensors are present throughout core functions in the smartphone, including microphones, multiple cameras, GPS technologies, accelerometers, and many more. This enables a constant stream of data that supports a broad range of third-party applications that now run ML workloads.
On the smartphone, AI and ML workloads have become integral to device functionality and enable various functions, such as photography, face recognition, and voice assistant.
Moreover, various AI and ML-based features are becoming more prominent, leading to increased demands for more complex, advanced AI and ML workloads on the smartphone device. These include:
Speech recognition, which at first recognized simple voice commands and now includes more complex commands and natural language processing.
Image recognition, which at first identified people and scenes in photos, but now identifies objects through XR integration on the smartphone.
Personalization, which has evolved from only the UI to targeted advertisements based on the user’s usage patterns.
Security, which used to just enable fraud detection in mobile transactions, but is now used for payments and banking.
And now with generative AI, we see the first steps toward running large language models (LLMs) on the smartphone device at the ‘edge,’ rather than in the cloud.
One of the first applications that used AI and ML workloads in the smartphone was photography. AI and ML ensures that photos look great regardless of the lighting or different light scenes or the user’s own photography skills. Through AI-powered computational photography on the smartphone, images can be enhanced, sharpened, or blurred where required, and for videos, noise can be reduced and a bokeh effect added. The smartphone provides the photography capture power that would have previously only been available in a bulky DSLR camera.
AI and ML also provide the foundation for intelligent software to make up for any capabilities that are lacking in the hardware. The new Arm Kleidi software unlocks AI capabilities and performance of the Arm CPU on any software platform with no developer integration required.
Google’s Real Tone camera technology, which was first introduced in its Arm-powered Google Pixel 6 range of smartphones, is a great example of intelligent software in action. This technology shows the proper complexion of people with darker skin tones in photos, and this piece of software AI-based innovation supports the camera hardware. Now Google has deployed Magic Eraser, its AI editing tool available for Google Photos users, all based on Arm.
AI and ML techniques have been contributing to innovation in graphics for the past several years. In 2018, the keyword ‘neural network’ was in 57th position among SIGGRAPH papers. But, by 2022, it was the number one keyword at the event, and as of 2024, the keyword “AI” is prominent. Neural networks are already disrupting desktop graphics and soon will inevitably disrupt mobile graphics.
With mobile gaming, the physical restrictions mentioned previously still apply. This means delivering more performance and more immersive gaming experiences within the same power envelope, while pixel costs skyrocket due to the increasing amount of immersive content on mobile. It’s the perfect storm.
Here are a few ways that AI and ML are adding to the mobile gaming experience through enhanced gameplay, faster performance, and more accessible development.
ML algorithms can be optimized to work within the limited processing power and memory of mobile devices, allowing for more realistic graphics and enhanced photorealism, without sacrificing performance. For example, ML with generative AI workloads can be used to generate more meteorological conditions, realistic vegetation, water, and fire effects in a mobile game.
ML algorithms can also automate many of the manual processes involved in game creation, freeing up game development time. All developers, from those working independently or in smaller studios to the bigger game studios, can then focus on aspects of the game that matter to them, like the graphics and gameplay.
The impact of AI and ML on graphics creation is one of the most significant. AI and ML techniques can boost many aspects of creativity in the game industry, making game development faster and more accessible:
Faster rendering times: AI and ML can speed up the rendering process by handling complex calculations and simulations. This allows developers to create more detailed and lifelike graphics in a shorter amount of time, reducing the time-to-market for games.
Increased efficiency: AI and ML can automate many of the manual processes involved in creating graphics. For example, ML can be used to generate 3D models and animations, reducing the need for manual labor. This increases efficiency and saves time, allowing developers to focus on other aspects of game creation.
Faster performance: ML can help improve the performance of mobile games, even with restricted processing power. For example, ML algorithms can be used to optimize the game's performance, reduce lag, and improve responsiveness.
Enhanced gameplay: ML can be used to improve the overall gameplay experience on mobile devices. For example, ML can generate more intelligent AI-based opponents, making the game more challenging and engaging for players, or making it easier for players to retain their attention and interest when they struggle. With LLMs, it can help making the story more immersive with natural interaction with characters and “tiny stories”. It could also take over a gaming character/player when someone leaves during team play. An example of this, is the use of Unity’s ML agents, as showcased by Arm at the 2023 Game Developer Conference (GDC 2023). ML Agents is a concept where gaming characters are trained with reinforcement learning to perform actions based on their surrounding environment. At Arm, we have developed a game called ‘Candy Clash’ which contains 100 ML Agents (rabbits) – 50 on each team – working cooperatively to crack their opponent’s egg. You can read more about this game in this blog.
Advances such as those outlined above continue to drive the growth of the mobile gaming industry, making high-quality and immersive gaming experiences more accessible to players worldwide.
So, what about the link between AI and ML and graphics? Is there a place where AI and ML networks can “smartly” use performance and, therefore, help to improve the overall visual experience?
AI and ML are already positively disrupting graphics across the following areas:
Transforming rendering efficiency and quality through texture creation, super-sampling, ray denoise, and upscaling techniques.
Potentially displacing classic rendering techniques with ML alternatives, like NeRF and assisted artwork generation.
Enabling the real-time processing of large amounts of data, and therefore enabling the creation of complex animations and simulations.
Automating manual/time developer consuming tasks.
Enhancing user experiences through more immersive graphics, enabling customizations and personalization.
Improving the accuracy and efficiency of image/video processing, such as object detection, image segmentation, and body/pose tracking.
Developing more advanced XR experiences, such as real-time 3D scene estimation.
Hiding latencies or network errors in multiplayer games.
Dynamically determining when to show tutorials to players to assist them with their games.
Advanced graphics and AI and ML techniques that have been traditionally been used on desktop and PC are now filtering through to mobile due to the performance capabilities of the modern smartphone. The use of advanced AI and ML in graphics is increasing efficiency, improving the quality of visuals and creating new possibilities for mobile devices and applications.
Currently, AI and ML are being used across computational photography and video for use cases like image and object detection. AI and ML are also helping to deliver a step change in the efficiency and quality of graphics through super resolution and higher frame rates.
In the future, the potential of AI and ML in graphics is huge. These future possibilities could cover new scene representations, new forms of neural texture compression, or entirely new AI and ML generated images and videos. It is likely that once AI and ML become more prevalent throughout graphics, then future games will look very different to how they are today.
These use cases are likely to use the following two types of AI and ML based graphics techniques:
Those where techniques are not viable unless a specific performance threshold is reached. For example, super sampling, frame rate upscaling, or ray denoising.
Those where neural networks allow developers to do something more or do the same thing more efficiently. These include techniques such as mesh deformation, learned animation, style transfer, real-time VFX, or neural radiance fields that would better suited to incremental improvements over successive hardware generations.
Here are a few examples of graphics that have been supported and helped by AI and ML
GANcraft is a powerful tool for converting semantic block worlds to photorealistic worlds without the need for ground truth data. “GANcraft simplifies the process of 3D modeling of complex landscape scenes, which will otherwise require years of expertise. GANcraft essentially turns every Minecraft player into a 3D artist!
Style transfer is a computer vision methodology that visualizes a photograph with the artistic style of a renowned painter. It works by enabling the restructuring of an image's content using the artistic style of another. As a manual process, it would be painstaking work to produce a version of one image, which is rendered in the style of another artist. Such a labor-intensive task could only be justified under a few circumstances. For quick-fire results, whether for inspiration or entertainment, people can employ ML to perform transformations that take photorealistic images into any preferred style.
A neural radiance field (NeRF) is a fully connected neural network for generating novel views of complex 3D scenes, based on a partial set of 2D images. NeRF takes input images representing a scene and interpolates between them to render a complete scene. An example of this is view synthesis, which creates a 3D view from a series of 2D images, or an estimation of scene geometry to support AR-based applications, such as those that insert virtual objects into real-world scenes with occlusion effects.
The slow inference of NeRF limits its application on resource-constrained hardware (such as mobile devices), as it requires either high-end GPUs or extra storage. To address this, emerging technologies such as MobileNeRF from Google Research or Neural Light Fields (NeLF) from SNAP can be used, as both demonstrate real-time variants of NeRF.
When NeRF requires hundreds of points to be sampled along the ray and results in a queried pixel, NeLF requires only one forward pass per ray to predict its pixel color, thus opening up faster rendering speed.
Built in a convolutional network, combining NeRF for modeling and generating pseudo data to train a NeLF based network, makes it possible to reach real-time inference on a mobile device. SNAP created such an efficient network, enabling real-world, real-time applications, such as an augmented reality virtual shoe try-on.
ML agents can help to animate digital assets quickly through reinforcement learning, which is the staple of neural networks. Rather than going with a traditional digital asset creation coupled with animations, with a click of a button and without any animators involved, developers can realistically bring their game to life through automatic physics-based animation. Learn more about it here.
At GDC 2022, Embark Studios presented a talk about how developers use ML techniques through reinforcement learning to train arbitrary creatures in games to walk. This specific ML animation is now present in the Unreal Engine. You can watch the original talk here.
Using AI, ML, and neural networks can produce high-quality visuals and still avoid moving to higher resolution. This is because rendering at a lower resolution enables the GPU to raise performance to essentially do more but in the same power envelope, and to maintain the stunning visual experiences. Use cases where AI and ML could help in this regard are super-sampling, ray denoising, and frame rate upscaling.
Super sampling is a spatial anti-aliasing method, which is used to remove aliasing from images. Aliasing in games, for example, can result in non-continuous smooth curves and/or lines. Through super sampling, the image is rendered at a much higher resolution than displayed, and then used for calculations to shrink the image to the desired size, removing aliasing.
Frame rate upscaling is a video processing technique used to increase the frame rate of a video through interpolating, generating, and inserting intermediate frames between the original frames. This makes the video appear smoother. AI and ML workloads can be especially suited to analyzing the motion between frames, identifying objects and movements, and then interpolating the new pixels values and positions that are needed to generate a new frame. For example, in mobile, this can allow games to be rendered at a decreased rendering load of 30FPS, and then the frame rate sampling is increased to 60FPS.
Ray tracing, which was brought to life on mobile by Arm with the first generation Immortalis-G715 GPU, has increased the levels of immersion provided by graphical and gaming content. Shadows are now rendered more accurately due to how rays are traced through geometry, bounce off the surface, and hit other objects.
To render a full scene effectively, processing power can be significant, with the number of rays used having an effect on how much noise remains in the scene. Ray-denoising techniques take a fewer number of rays, and then turn it into a smoother result than many more rays would have been able to achieve. This is transformative for image quality, enabling richer, more immersive content.
The two main techniques for denoising are:
Spatial filtering, which changes selected parts of the image, while reusing similar neighbouring pixels. While this technique is fast, it is not great at dealing with changing conditions, and can introduce some blurriness and imperfections to the image.
Temporal accumulation, which reuses data from the previous frame, to find visual anomalies that can be corrected in the current frame. While this technique does not produce blurriness, it adds some temporal lag.