NXP Semiconductors Ara240 Discrete Neural Processing Units (DNPUs)
NXP Semiconductors Ara240 Discrete Neural Processing Units (DNPUs) enable real-time generative AI, large language models (LLMs), and vision-language models (VLMs) on edge and embedded compute systems, delivering low latency, lower operational costs, and enhanced data privacy. Its innovative architecture combines balanced compute and high off-chip bandwidth to execute large models efficiently.
The Ara240 architecture is designed to support advanced multimodal and transformer-based workloads, achieving up to 40 equivalent tera-operations per second (eTOPS) and supporting up to 16GB of LPDDR4 memory. This feature enables smooth execution of large and complex models directly at the edge, without reliance on high-cost cloud compute resources.
With integrated secure boot and a hardware root-of-trust processor, Ara240 provides hardened security for industrial, enterprise, and embedded deployment environments. Its PCIe Gen4 and USB 3.2 host interfaces allow straightforward integration into edge compute platforms, PCs/laptops, and AI-enabled embedded systems.
The NXP Semiconductors Ara240 is supported by NXP's AI/ML ecosystem, including the NXP eIQ Toolkit and Ara Software Development Kit (SDK), which accelerates model development, optimization, and deployment.
Features
- Processor
- Ara240 Discrete Neural Processing Unit (DNPU) delivering up to 40 equivalent tera operations per second (eTOPS)
- Proprietary Neural Network Processor (NNP) operating up to 900MHz
- Memory
- Supports up to 16GB external low-power double data rate 4 (LPDDR4)
- Includes 4MB SPI NOR Flash and 8KB I2C EEPROM for boot, configuration, and runtime data
- Security
- Secure boot ensures the module's authenticated startup
- Root-of-trust processor establishes a hardware foundation for secure AI deployment
- Interfaces and connectivity
- PCIe Gen4 host interface, configurable as x1, x2, or x4 lanes for high-bandwidth data transfer
- USB 3.2 Gen 2 for flexible host communication
- Ease of Use
- Operating System Support (Runtime) - Linux
- Ara Software Development Kit (SDK)
- 17mm x 17mm x 0.65mm pitch flip-chip ball grid array (FCBGA) packages
- High-performance, real-time AI that runs LLMs, VLMs, multimodal, and generative AI workloads at the edge with up to 40 eTOPS
- Low-latency execution and lower operating cost with local inference reduce round-trip delays and cloud dependency
- Support for large-model execution with high on-chip memory and up to 16GB LPDDR4(X) to deliver efficient handling of large transformer models
- Secure deployment with built-in secure boot and root-of-trust processor
- Flexible host integration with PCIe Gen4 and USB interfaces for embedded, PC, and edge server platforms
Applications
- Generative AI at the Edge
- Computer vision and multimodal systems
- Industrial automation
- Advanced robotics
Block Diagram
