Mastering NPUs: Technological Advances and the NPU Cloud Model

With the rapid advancement of artificial intelligence (AI), the need for new specialized processing units has become increasingly evident. The ecosystem is becoming more complex and continuously offers new avenues for innovation. Traditionally, CPUs (Central Processing Units) and GPUs (Graphics Processing Units) were used to handle AI workloads, leading to the development of NPUs (Neural Processing Units). The term “neural” refers to processing analogous to brain functions.

Compared to GPUs, NPUs offer improved performance and energy efficiency. Efficiency is crucial due to the massive energy consumption of today’s computations. It’s not just about data centers; energy consumption is critical in many other situations such as IoT, embedded applications, mobile devices, and edge systems.

These specialized processors are designed to accelerate AI operations directly on devices, enhancing efficiency and performance. In the future, the processing landscape will change further, with new solutions emerging alongside existing ones.

NPUs: Technology and Functionality

NPUs are designed to handle the mathematical operations required by neural networks, such as matrix multiplication and vector addition, which are fundamental to deep learning algorithms. Their architecture is characterized by a high degree of parallelism, allowing simultaneous processing of many operations on vast amounts of data. This is crucial for the training and inference of machine learning models.

Memory also plays an important role; NPUs are often equipped with high-speed memory, with lower latency than traditional ones, greatly improving overall performance.

All major chip companies developed their NPU-powered devices: Intel (Core Ultra processors with AI Boost), AMD (EPYC Processors, Xilinx FPGAs), NVIDIA (A100 Tensor Core GPU, Grace CPU), Microsoft (Brainwave), Apple (A-series and M-series Chips).

Several other companies have developed processors with NPU; let’s look at some examples.

1. Amazon, from Gravitron Onward

Starting its design from the Arm core, Amazon has developed chip families such as Gravitron, Trainium, Inferentia, and Nitro. Together, they cover most data center needs, as the architecture combines the advantages of CPUs, GPUs, and FPGAs. The main applications are image recognition, natural language processing, recommendation systems, and driving assistance (including autonomous) in vehicles. They are ideal for use in edge devices and data centers.

2. Etched

Etched’s NPU design maximizes energy efficiency while maintaining high performance. They use an innovative architecture that reduces energy consumption without compromising processing speed. This combination is particularly beneficial for portable devices and embedded systems.

3. Google TPU (Tensor Processing Unit)

Google’s TPU accelerates machine learning operations. They are mainly used to improve the efficiency and processing speed of AI applications running in Google’s data centers.

4. Huawei Ascend

Huawei’s Ascend series includes processors with NPUs that support a wide range of AI applications, from computer vision to natural language understanding. They are known for their ability to perform real-time inferences with low energy consumption.

5. NVIDIA Jetson

NVIDIA Jetson solutions integrate NPU to offer powerful AI capabilities in a compact format. Due to their efficiency and computing power, these chips are often integrated into robotics, drones, and edge systems.

6. Tenstorrent

Tenstorrent’s NPUs are known for their high performance and flexibility. They use an advanced interconnection network that allows them to handle complex AI workloads with great efficiency. A distinctive feature is the ability to scale effectively from edge devices to large data centers.

7. Qualcomm

Apart from portable and wearable device Chips, this Company provides the Cloud AI portfolio of inference cards, ready to deploy aperformance- and cost-optimized AI inference solution.

Its Cloud AI 100 Ultra family addresses the unique requirements for scaling classic and generative AI workloads, ranging from computer vision and natural language processing to transformer-based LLMs.

Future Directions

Mixed architectures with CPU, GPU, and possibly TPU started to be used for all AI workloads. Later, it became clear that training requires one type of architecture, while inference benefits from a completely different one. The term “architecture” is used because it adds to processing units memory chips, connections, and storage. Recently, given the costs and waiting queues for purchase, performance optimization services for data centers running AI workloads have become increasingly relevant.

To satisfy the need for new specialized processing units, upgrading rapidly AI infrastructures, optimizing costs and at the same time reducing the impact on the environment, the companies can take advantage of NPUs provided on the cloud.

NPU cloud is the best solution for quick, flexible, and sustainable innovative chips provisioning, accelerating inference tasks and supporting in particular the development of small language models. As every cloud computing service, also the NPU cloud computing infrastructure can be provisioned on demand, representing the most efficient way to access AI power with flexibility and investment optimization.

Business Magazine