Embedded AI Systems Benefit from Flashless MCUs


Use of artificial-intelligence (AI) techniques, based on machine learning and inference, is growing rapidly in embedded systems, including applications that employ image analysis, speech recognition, and predictive decision-making. However, AI workloads are notoriously compute- and performance-hungry, and the amount of storage required for AI data and code is far more than can be supported by the limited embedded memory in most MCUs. In addition, the processing performance of these AI applications competes with the need for lower power consumption.

One way to address these issues is through an external flash-memory architecture that provides high-throughput, low-power memory to enable AI applications in resource-constrained environments like embedded systems on the network edge. Read on to learn more about off-chip non-volatile memory and how external flash memory can meet the performance requirements of compute-intense applications with low latency and high throughput.

Requirements of Embedded AI

Some of the most notable uses of AI at the network edge are embodied by various smart-home applications powered by smart speakers. Not only can these systems recognize when you’re approaching your home and welcome you by automatically turning on lights and heat, they can also have your favorite music playing when you step into the door.

Today’s smart speakers aren’t limited to scripted interactions with humans, but they can engage in conversations to better determine how best to help you out. Ask your smart speaker to call you a Lyft, and it will follow up by asking your destination and how large a car you need. Based on previous online search or shopping activities, the smart speaker can personalize its responses and recommendations to you or to other members of your household, based on voice recognition.

In fact, recognizing a range of human characteristics such as face, voice, position, and gesture can be used in security and access-control systems to identify authorized access.

These functions are also useful in robotics. They can enable safer operation of industrial robots, which cooperate with humans in a shared workspace (so-called cobots). These need to be able to recognize the position and actions of a human co-worker, particularly when the human becomes distracted or enters a dangerous area of the workspace. Other applications in social and health care will require robotic assistants that can recognize people and use natural language for interacting.

Some AI systems are able to use online resources. Pushing processing to the cloud makes greater compute performance and larger data sets available. But remote online access can’t always be guaranteed, and the latency is too great for many applications. There are also privacy and security concerns about uploading personal information to the cloud—especially if it might be shared with third parties, such as developers.

Therefore, local processing capability is essential for many AI workloads that require significant amounts of data-processing performance. AI coefficients (sometimes called weights) data is held in large tables (typically greater than tens of megabytes). AI inferencing is done by performing a massive amount of computations between these coefficients and the data arriving from the camera, microphone or other sensors. 

The speed at which the weights can be fetched impacts the performance of the AI operations. Furthermore, the power consumed by the memory is a significant contributor to the system’s total power consumption.

An Off-Chip Storage Architecture for Memory Expansion

To address new applications such as AI at the edge, a new class of microcontrollers (MCUs) is emerging that combines higher-performance processing than traditional MCUs with an MCU’s power efficiency. Utilizing the latest process technologies makes it all possible, but embedded flash memory often isn’t available on these more advanced process nodes.

By using external flash together with these MCUs, there’s no practical limit to the size of the memory. Using a serial peripheral interface (SPI) from the MCU to the memory reduces pin count and power consumption when compared with other memory technologies. However, traditional serial flash doesn’t provide the bandwidth required for high-performance compute applications.

The solution is a high-throughput, low-power external memory device that was designed from the ground-up to support both eXecute in Place (XiP) operation and the continuous fetching of AI data. XiP isn’t a magic bullet, but it does offer an effective method of memory expansion to meet high-performance processing requirements for emerging applications.

The eXecute in Place (XiP) architecture enables execution of code directly from external serial flash memory.

For example, Adesto’s EcoXiP flash memory exploits the latest JEDEC extended SPI (xSPI) specification for high-speed serial flash memory (see figure). It’s optimized for XiP operation and provides the bandwidth needed to repeatedly fetch the AI coefficients. It uses eight data lines and double-data-rate (DDR) technology to provide much greater bandwidth than traditional SPI serial memory. In addition, it reduces latency by supporting the typical access patterns of the processor while executing in place.

EcoXiP is an Octal xSPI non-volatile memory (NVM) device that enables the use of MCUs based on more modern process nodes that don’t support on-chip flash, leading to gains in performance and power efficiency. It has lower power consumption than other Octal memory devices and delivers greater system-level power efficiency when compared with devices that use a quad memory device.

When used in conjunction with a high-performance embedded processor, EcoXiP offers a highly scalable platform to deliver the processing performance and low power consumption that enable AI workloads in resource-constrained systems.

Bård M. Pedersen is Director of Technology at Adesto Technologies Corp.



Source link