Getting more performance out of a system to handle machine-learning (ML) chores can be done using Intel’s Deep Learning Boost. I talked with Huma Abidi, Director of Engineering, Artificial Intelligence and Deep Learning at Intel, to find out more.
How are Intel CPUs more capable than ever for running AI inference applications?
CPUs today offer exciting acceleration features that developers and enterprises can take advantage of while remaining on the familiar architecture that their company already runs on. Intel offers Intel Deep Learning Boost (Intel DL Boost), a group of acceleration features built into second-generation Intel Xeon Scalable processors, which provides significant performance increases to deep-learning inference applications.
Intel optimization for Caffe ResNet-50 shows a 14X increase in inference throughput (images/sec) compared to previous generations, and an additional 2X increase in performance gains when optimized for Intel Xeon 9200 Platinum processors.
AI applications can be very demanding and compute-intensive. How does Intel DL Boost address that issue?
Intel DL Boost follows a long history of the company adding acceleration features to its hardware to increase the performance of targeted applications. With Intel DL Boost, we build upon this foundation to further accelerate AI on Intel architecture.
Intel DL Boost is an AI instruction set that includes a feature called the Vector Neural Network Instructions (VNNI). VNNI has two main benefits to deep-learning applications:
- VNNI use a single instruction for deep-learning computations that formerly required three separate instructions. As you would expect, using one instruction in place of three yields significant performance benefits.
- VNNI enable INT8 deep-learning inference. Rather than the typical FP32 (single-precision 32-bit floating-point format), INT8’s 8-bit integer data type/lower precision increases power efficiency by decreasing compute and memory bandwidth requirements. INT8 inference has produced significant performance benefits with little loss of accuracy.
What AI frameworks does Intel DL Boost work with?
Intel DL Boost provides significant performance increases to applications built using leading deep-learning frameworks such as PyTorch, TensorFlow, MXNet, PaddlePaddle, and Caffe.
- PyTorch: Intel and Facebook have partnered to increase PyTorch performance with Intel DL Boost and other optimizations. With Intel DL Boost and 2nd Gen Intel Xeon Scalable processors, we have found up to a 7.7X performance boost for a 32-bit floating-point (FP32) model and up to a 19.5X performance increase for an INT8 model when running ResNet-50 inference. Because of this collaboration, Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN) optimizations are integrated directly into the PyTorch framework, enabling optimization of PyTorch models with minimal code changes.
- TensorFlow: Developers can use Intel AI Quantization Tools for TensorFlow to convert a pre-trained FP32 model to a quantized INT8 model. Several pre-trained INT8 quantized models for TensorFlow are included in the Intel Model Zoo in categories like image recognition, object detection, and recommendation systems.
- Apache MXNet: The Apache MXNet community has delivered quantization approaches to enable INT8 inference and use of VNNI. As a result of this new quantization approach and operator fusion, a 3.7X performance speed-up was demonstrated using AWS EC2 CPU instances.
- PaddlePaddle: Intel and Baidu have collaborated since 2016 to optimize PaddlePaddle performance for Intel architecture. In Intel’s testing, INT8 inference resulted in 2.8x throughput for ResNet-50 v1.5 with just 0.4% accuracy loss in comparison to an earlier FP32 model.
- Intel Caffe: JD.com collaborated with Intel engineers to use Intel DL Boost to increase the performance of a text detection application by 2.4X with no accuracy degradation in comparison to an earlier FP32 model.
What kinds of inference applications best take advantage of Intel DL Boost?
Deep-learning applications that require high-performance compute capabilities and low latency can take advantage of Intel DL boost. Some of the practical applications include image recognition, object detection, and recommendation systems.
What should developers and systems architects keep in mind when they get ready to adopt Intel DL Boost?
We have been working with the AI community to optimize the most popular open-source deep-learning frameworks for Intel DL Boost so that developers can more easily benefit from the performance and efficiency gains it provides.
Developers can use Intel tools to convert a FP32 trained model to an INT8 quantized model. This new INT8 model will automatically benefit from Intel DL Boost acceleration when used for inference in place of the earlier FP32 model and run on 2nd Gen Intel Xeon Scalable processors.
For additional support, Intel also provides a Model Zoo, which includes INT8 quantized versions of many pre-trained models, such as ResNet-101, Faster-RCNN, and Wide&Deep. We hope these models and tools get developers up and running with Intel DL Boost more quickly.
Which customers are using the technology, and what results are they seeing?
Dell EMC has reported a greater than 3X improvement in performance over the initial Intel Xeon Scalable processors using our pre-trained INT8 ResNet-50 Model and 2nd Gen Intel Xeon Scalable processors with Intel DL Boost. JD.com worked with Intel engineers to enable the new Intel DL Boost technology in its text detection application and achieved about a 2.4X performance boost with no accuracy loss compared to the original FP32 solution. This optimized solution will enhance its business experience and help JD.com reduce TCO.
Siemens Healthineers uses Intel technology for its AI-based Cardiac MRI segmentation, utilizing AI to help automate and standardize complex diagnostics to improve patient outcomes. The second-generation Intel Xeon Scalable processors with Intel Deep Learning Boost, in combination with the Intel Distribution of OpenVINO toolkit, automates anatomical measurements in near real-time and improves workflow efficiency, while maintaining accuracy of the model without the need for discrete GPU investment.
Baidu has seen an overall 2X to 3X improvement in performance of its model based on the Intel DL Boost feature. Tencent‘s cloud server and supply-chain department is utilizing Intel DL Boost to help accelerate Tencent Cloud AI performance. It’s resulted in a 3.26X increase in efficiency of video analysis and a 2X improvement in efficiency of NSFW filtering, compared to the last Intel Xeon processor generation.
Huma Abidi is Engineering Director in Machine Learning and Translation team at Intel Corp., responsible for Deep Learning Software optimization for Intel Xeon processors. Huma joined Intel as Software Engineer and has worked in a variety of Engineering, Validation, and Management roles in the area of Compilers, Binary translation, and Artificial Intelligence (AI) and Deep Learning. Twice she has received the Intel Achievement Award – Intel’s highest honor and thrice awarded with the Intel Software Quality award, for delivering quality software.