The proliferation of edge devices that employ artificial intelligence (AI) and machine learning (ML) presents designers with a growing array of challenges, including overcoming performance limitations, supporting connectivity, and ensuring security, all while minimizing power consumption.
Balancing the need to maximize performance while minimizing power consumption can be especially difficult. Using a multi-core microcontroller unit (MCU) is a must. Using an MCU that has an integrated neural network (NN) processing unit (NPU) can be a key consideration.
The MCU should have a dual-core architecture to support high performance, on-chip accelerators, connectivity, security features, and various analog and digital peripherals for sensing, control, and human-machine interfaces (HMI). For enhanced flexibility, the NPU needs to support a variety of NN architectures, including convolutional NNs (CNNs), recurrent NNs (RNNs), and deep learning models like temporal convolutional networks (TCNs), transformer networks, and more.
Finding an MCU with all those features is great. Finding one supported with an open-source library like TensorFlow for ML and AI development is even better.
This article reviews the features of the MCX N Series MCUs from NXP Semiconductors and how they address the performance needs of edge device developers, including the capabilities of the integrated NPUs. It presents available evaluation kits and development boards, looks at the possibility of embedding a watermark in an ML model to protect ownership, and reviews the use of TensorFlow and TensorFlow Lite.
NXP’s MCX N industrial and IoT (IIoT) MCUs feature dual Arm Cortex-M33 cores operating up to 150 MHz and are well-suited for Edge AI computing applications. They’re designed for the high performance and low power consumption needed on the edge.
These MCUs feature an NPU for ML acceleration. When running typical ML models, the NPU delivers up to 42x faster performance than a standard computer processing unit (CPU) core.
The NPU, coupled with the dual-core architecture, means more processing is done in less time. That allows the system to spend more time in sleep mode, reducing overall power consumption. The low-power cache further reduces power consumption.
The MCX N series MCUs have a comprehensive security subsystem that includes secured boot, crypto accelerators, temper detection, and more to safeguard over-the-air communications. These MCUs also include various interfaces for system integration, analog functions, timers, and motor control blocks (Figure 1).
The MCX N MCUs are offered in two groups that differ based on the integrated peripherals. N94x devices like the MCXN946VDFT have a wider set of analog and motor control peripherals, while the MCX N54x devices like the MCXN546VDFT include a high-speed USB with a PHY to secure digital (SD) and smart card interfaces. Common features of these MCUs that make them especially suited for edge AI applications include:
- Dual Arm Cortex-M33 150 MHz with 618 CoreMark (4.12 CoreMark/ MHz) for each core.
- Primary: Arm 32-bit Cortex-M33 CPU with TrustZone, MPU, FPU, SIMD, ETM and CTI
- Secondary: Arm 32-bit Cortex-M33 CPU
- 8 GOPs edge AI/ML acceleration with NPU
- Platform security with EdgeLock Secure Enclave, Core Profile
- Operating temperature range of -40 °C to +125°C
- Low power operation down to 57 μA/MHz active current, 6 μA in power-down mode with RTC enabled and 512 KB SRAM retention, and 2 μA in deep power-down mode with RTC active and 32 KB SRAM
NPU acceleration
The key to MCX N94x and MCX N54x performance in edge ML applications is the integrated NPU. The eIQ Neutron NPU is built using a highly scalable architecture that supports flexible ML acceleration. It delivers up to 40x faster ML throughput than a standard computer processing unit (CPU) core.
Fast throughput is especially important in edge devices. It enables the device to spend less time awake, reducing overall power consumption and extending battery life. The eIQ Neutron NPU is scalable from 32 Ops/cycle to over 10,000 Ops/ cycle, delivering as much speed and computing power as needed. It’s not just fast; it’s flexible and can support most types of NNs, including CNN, RNN, TCN, transformer networks, and more.
The eIQ Neutron N1-16 NPU in the MCX N94x MCUs can execute 4.8 G (Giga operations per second) INT8 operations (150 MHz * 4 * 4 * 2) per second using its four compute pipes, and each compute pipe contains 4 INT8 Multiply-Accumulate (MAC) blocks for a total of 16 MAC blocks. Some of the Neutron N1-16 NPU functional blocks and software tools include (Figure 2):
- Dedicated controller core
- In-line dequantization, activation, and pooling
- Integrated tiny caching that reduces power consumption and reliance on system memory speed
- Weight decompression engine
- Multi-dimensional DMA for input and output formats, including striding, batching, interleaving, and concatenating
- Programmability through eIQ ML SW development environment and eIQ-supported runtime inference engines
- The MCUXpresso Software Development Kit (MCUXpresso SDK) provides a comprehensive software package with a pre-integrated TensorFlow Lite for Microcontrollers (TFLM), including the Neutron library.
EVKs and dev boards
Evaluation kits (EVKs) and development boards (dev boards) are available for the MCX N94 and N54 MCUs. At the start of a project, developers can use the MCX-N5XX-EVK, EVK for MCX N54 devices and the MCX-N9XX-EVK EVK for MCX N94 devices to compare performance and application suitability and begin development.
For example, the MCX-N9XX-EVK includes an MCX N94X MCU with a 64-Mbit external serial flash, FXLS8964AF accelerometer, P3T1755DP temperature sensor, visible light sensor, onboard CAN PHY, Ethernet PHY, secure digital high capacity (SDHC) circuit, onboard MCU-link debug probe circuit with energy monitoring, and more. The board is compatible with the Arduino and FRDM ecosystems, as well as Mikroe click boards.
Designers can also use the FRDM-MCXN947, a scalable dev board optimized to speed prototype development using the MCX N94 and N54 MCUs. This dev board includes headers for access to I/Os on the MCU, serial interfaces, external flash memory, and an on-board MCU-link debugger. NXP also offers expansion boards and code examples for various applications through the MCUXpresso Developer Experience.
eIQ ML IDE and watermarks
EVKs and dev boards are important tools for designers. Once the MCU selection has been made and development started, an SDK and integrated development environment (IDE) are needed. The NXP eIQ ML IDE is a software development environment for ML algorithms on MCX-N MCUs.
eIQ ML software includes an ML workflow tool called eIQ Toolkit, along with inference engines, neural network compilers, and optimized libraries for the eIQ Neutron NPU. The eIQ Toolkit uses an intuitive graphical user interface (GUI).
With the Toolkit, developers can quickly create, debug, optimize, and export ML models. They can also import models and datasets from various sources, including TensorFlow frameworks. The output seamlessly feeds into TensorFlow Lite, TensorFlow Lite Micro, and other platforms.
The Toolkit also supports watermarking ML models to strengthen ownership claims. ML models and data sets represent substantial investments, and establishing clear ownership can be an important consideration. To help strengthen developers’ ownership position, NXP has added a method within the eIQ Toolkit ML Software Enablement environment for embedding a watermark in a model.
Leveraging TensorFlow Lite
TensorFlow is an open-source software library for building, training, and deploying machine learning models and inference on deep neural networks. TensorFlow development is compatible with Python APIs, Java, JavaScript, C++, and other languages. TensorFlow is typically used for model training and inference on devices. TensorFlow Lite is optimized for inference on resource-constrained devices.
TensorFlow Lite is a subset of TensorFlow designed for power-restricted applications like edge and mobile devices. Developers can create, train, and modify ML models in TensorFlow and then convert them to a smaller, more efficient format using TensorFlow Lite. But there are some caveats.
The ML operators used in a model can be important considerations. Not all are equally easy to convert to TensorFlow Lite. A limited number of operations are included in the conversion tool. TensorFlow also supports custom operators (Figure 4). The conversion tool enables designers to add operators, which adds a layer of complexity and can inadvertently limit deployment options.
TensorFlow Lite models require less memory, and their relative simplicity results in minimal latency for real-time operation on the edge. In addition, edge inference using TensorFlow Lite does not rely on a wireless connection and enables devices to be widely deployed without considering the availability of an Internet connection.
No further training occurs after a model is deployed on an edge or mobile device using TensorFlow Lite. That makes TensorFlow Lite models well-suited for focused, task-oriented applications. Additional training can occur using TensorFlow in the Cloud with the new model downloaded to edge devices.
Conclusion
NXP’s MCX N Series MCUs support designers’ needs for high performance and low power consumption in edge AI/ML devices. The integrated NPU is a key element that supports ML on the edge. In addition, NXP offers EVKs, dev boards, and ML tools to speed the development process, including TensorFlow Lite models that need less memory and are optimized for low-latency edge operation.
Sponsored content by DigiKey
Filed Under: SPONSORED CONTENT