Building Machine Learning Software with Reference Images for Yocto Project

Applicable for

BSP Layers / Reference Images for Yocto (Linux)

[hide]

Article updated at 12 Jul 2022

Compare with Revision

Introduction

NXP eIQ software provides the enablement software for Machine Learning application optimized for i.MX SoCs.

eIQ offers Neural Network acceleration on NXP SoCs on the GPU or NPU through the usage of OpenVX as backend. Also, when executing inference on Cortex-A cores, NXP eIQ inference engines support multi-threaded execution.

eIQ is provided on a Yocto layer called meta-imx/meta-ml.

In this article, we will show how to integrate to the Toradex Reference Images for Yocto Project Software the following AI runtimes:

Toradex BSP Version	meta-ml version	AI Runtimes
Monthly: 5.7.0	Based on NXP BSP L5.15.32_2.0.0 Download documentation (requires login)	TensorFlow Lite v2.8.0 ONNX Runtime 1.10.0 OpenCV 4.5.4

The eIQ software based on NXP BSP L5.15.32_2.0.0 also offers support for DeepViewRT. For more information, please read the documentation.

All the AI Runtimes (except OpenCV, as documented on the i. MX Machine Learning User's Guide) provided by eIQ supports OpenVX (GPU/NPU) on its backend.

You can find more detailed information on the features of eIQ for each specific version on the i.MX Machine Learning User's Guide available on the NXP's Embedded Linux Documentation. See the version-specific information on the links in the table above.

At the time of this writing, the latest versions of the Toradex BSP and meta-ml were used. You can try to adapt the instructions to build on newer versions of BSP / meta-ml, but it may not work.

Converting Tensorflow Models to Tensorflow Lite

As stated in the Tensorflow Lite Documentation, the framework is:

TensorFlow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded, and IoT devices.

In order to execute Tensorflow models with Tensorflow Lite, you need to use the Tensorflow Lite Converter. Please, note that the Tensorflow version used to design the model needs to match the Tensorflow Lite version.

Also, as stated on Tensorflow Lite documentation , not every model is directly convertible to Tensorflow Lite because some TF operators do not have a corresponding TFLite operator. However, in some situations, you can use a mix of Tensorflow and Tensorflow Lite ops. There is a list of TensorFlow ops that can be used with Tensorflow Lite by enabling the Select TensorFlow Ops feature. Please, see the TensorFlow Lite Documentation for more information about this feature and how to enable it.

Pre-Requisites

One of the following Toradex SoM's:
- Verdin iMX8M Plus (CPU/GPU/NPU support)
- Apalis iMX8 (CPU/GPU support)
- Other i.MX-based SoMs may have support for CPU-only but are not tested. You can follow the article at your own risk.
A compatible Carrier Board
Optional: a USB webcam.
Read the Build a Reference Image with Yocto Project article.

Adding eIQ recipes to Reference Images for Yocto Project

Clone the Toradex BSP repository

First, create a directory in your home named yocto-ml-build and use git-repo to obtain the Toradex BSP on the version 5.7.0, as explained in the section First-time Configuration of the Build a Reference Image with Yocto Project article:

Note: To facilitate the comprehension of this article, we will create a directory in home called ~/yocto-ml-build. You can, of course, use any name you want.

$ mkdir -p ~/yocto-ml-build/bsp-toradex && cd ~/yocto-ml-build/bsp-toradex
$ repo init -u https://git.toradex.com/toradex-manifest.git -b refs/tags/5.7.0-devel-202206 -m tdxref/default.xml
$ repo sync

Note: At the time of this writing, we tested the building with the latest quarterly release of the Toradex BSP Layers and Reference Images for Yocto Project Software, version 5.7.0. You can use these instructions to build the meta-ml with newer BSP versions; however, we didn't test it.

Getting eIQ

Git clone the meta-imx repository to your ~/yocto-ml-build/ directory:

$ git clone --depth 1 -b kirkstone-5.15.32-2.0.0 git://source.codeaurora.org/external/imx/meta-imx ~/yocto-ml-build/meta-imx

Getting pybind11-native

One of the dependencies of tensorflow-lite is python3-pybind11-native, which is not available in this Toradex BSP version, so we need to download it.

Git clone the meta-sca repository to your ~/yocto-ml-build/ directory:

$ git clone --depth 1 -b dunfell https://github.com/priv-kweihmann/meta-sca.git ~/yocto-ml-build/meta-sca

Getting a newer cmake version

Onnxruntime requires a newer cmake version than that provided by dunfell. As a workaround, we will a newer cmake version from kirkstone.

Git clone the openembedded-core repository to your ~/yocto-ml-build/ directory:

$ git clone --depth 1 -b kirkstone git://git.openembedded.org/openembedded-core ~/yocto-ml-build/openembedded-core-kirkstone

Copying the Recipes to your environment

First, create a layer named meta-ml, add it to your environment and remove the example recipe:

$ bitbake-layers create-layer ../layers/meta-ml
$ bitbake-layers add-layer ../layers/meta-ml
$ rm -rf ../layers/meta-ml/recipes-example

Meta-imx

Copy the recipes from meta-imx to your layer.

$ cp -r ~/yocto-ml-build/meta-imx/meta-ml/recipes-* ../layers/meta-ml/
$ cp -r ~/yocto-ml-build/meta-imx/meta-bsp/recipes-support/opencv ../layers/meta-ml/recipes-libraries/

Python3-pybind11-native

Copy the recipe for python3-pybind11-native to your layer.

$ cp -r ../../meta-sca/recipes-python/python-pybind11-native ../layers/meta-ml/recipes-libraries/

Cmake

Copy the recipe for cmake to your layer and remove the existing recipe.

$ cp -r ../../openembedded-core-kirkstone/meta/recipes-devtools/cmake ../layers/meta-ml/recipes-devtools/
$ rm -rf ../layers/meta-openembedded/meta-oe/recipes-devtools/cmake

Layer adjustments

This version of meta-ml targets a version of Open-Embedded slightly different than the one of Toradex BSP version 5.7.0. For that reason, some adjustments are necessary.

OpenCV

In order to build the OpenCV 4.5.4 with the BSP 5.7.0, you need to make an adjustment to its recipe:

$ sed -i 's/require recipes-support\/opencv\/opencv_4.5.2.imx.bb/require backports\/recipes-support\/opencv\/opencv_4.5.2.imx.bb/g' ../layers/meta-ml/recipes-libraries/opencv/opencv_4.5.4.imx.bb

Flatbuffers

We need to use the flatbuffers recipe from our meta-ml layer, so we must remove the existing one:

$ rm -rf ../layers/meta-openembedded/meta-oe/recipes-devtools/flatbuffers

Machine compatibility

There are several required layers that declare compatible machines, this requires us to add our machines as compatible:

$ for file in "../layers/meta-ml/recipes-libraries/arm-compute-library/arm-compute-library_21.08.bb" "../layers/meta-ml/recipes-libraries/tensorflow-lite/tensorflow-lite-vx-delegate_2.8.0.bb" "../layers/meta-ml/recipes-libraries/tim-vx/tim-vx_1.1.39.bb" "../layers/meta-ml/recipes-libraries/nn-imx/nn-imx_1.3.0.bb"; do
$   echo 'COMPATIBLE_MACHINE:apalis-imx8 = "(apalis-imx8)" ' >> "$file"
$   echo 'COMPATIBLE_MACHINE:verdin-imx8mp = "(verdin-imx8mp)" ' >> "$file"
$ done

Onnxruntime VSI NPU

The onnxruntime has a machine-dependant configuration for the NPU, we must add the verdin-imx8mp as a machine that uses vsi_npu:

$ sed -i 's/PACKAGECONFIG_VSI_NPU:mx8-nxp-bsp   = "vsi_npu"/PACKAGECONFIG_VSI_NPU:mx8-nxp-bsp   = "vsi_npu"\nPACKAGECONFIG_VSI_NPU:verdin-imx8mp   = "vsi_npu"/g' ../layers/meta-ml/recipes-libraries/onnxruntime/onnxruntime_1.10.0.bb

Adding the recipes to your distribution

Add the meta-ml recipes to your image:

$ echo 'IMAGE_INSTALL_append += "tensorflow-lite tensorflow-lite-vx-delegate onnxruntime"' >> conf/local.conf

Add some image processing libraries to be able to execute additional image manipulations such as resize, crop, etc.:

$ echo 'IMAGE_INSTALL_append += "opencv python3-pillow adwaita-icon-theme "' >> conf/local.conf

In order to build the image a little bit faster, for now, we will remove the Qt packages. Keep it if you are planning to use Qt in your image.

$ echo 'IMAGE_INSTALL_remove += "packagegroup-tdx-qt5 wayland-qtdemo-launch-cinematicexperience "' >> conf/local.conf

Add the SCA_DEFAULT_PREFERENCE variable to satisfy the recipe for python3-pybind11-native:

$ echo 'SCA_DEFAULT_PREFERENCE ?= "-1" ' >> conf/local.conf

Building

Build the tdx-reference-multimedia-image image for your target SoM as explained on the Build a Reference Image with Yocto Project article.

Note: In some situations of internet or server instability, the building may fail to clone some repository with an error similar to: do_fetch: Fetcher failure for URL:. In most cases, this issue will be resolved by re-trying the building.

Flashing the image

To flash your image to the board, see the Quickstart Guide for your SoM.

Executing Demos

NXP provides an example for executing inference with and without GPU/NPU support. You can compare the inference time of each.

To execute it, cd to the example's directory:

# cd /usr/bin/tensorflow-lite-2.8.0/examples/

This demo will take an arbitrary picture (grace_hopper.bmp) as an input of an image classification neural network based on Mobilenet V1 (224x224 input size). See more information about this demo on the NXP's i.MX Machine Learning User's Guide

To execute the demo:

NPU support (Verdin iMX8M Plus)
GPU support (Apalis iMX8 and Verdin iMX8M Plus)
CPU only (Apalis iMX8 and Verdin iMX8M Plus)

# USE_GPU_INFERENCE=0 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so

# USE_GPU_INFERENCE=1 ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt --external_delegate_path=/usr/lib/libvx_delegate.so

# ./label_image -m mobilenet_v1_1.0_224_quant.tflite -i grace_hopper.bmp -l labels.txt

See below a comparison of Inference Time executing this demo:

Som	Inference Time	FPS (1/Inference Time)
Apalis iMX8Q Max - CPU only	43.025 ms	23.24 fps
Apalis iMX8Q Max with GPU Support	11.919 ms	83.90 fps
Verdin iMX8M Plus - CPU only	45.501 ms	21.98 fps
Verdin iMX8M Plus with GPU Support	162.571 ms	6.15 fps
Verdin iMX8M Plus with NPU Support	2.619 ms	381.83 fps

Alternatively, you can run the same example using a Python implementation:

NPU support (Verdin iMX8M Plus)
GPU support (Apalis iMX8 and Verdin iMX8M Plus)
CPU only (Apalis iMX8 and Verdin iMX8M Plus)

# USE_GPU_INFERENCE=0 python3 label_image.py -e /usr/lib/libvx_delegate.so

# USE_GPU_INFERENCE=1 python3 label_image.py -e /usr/lib/libvx_delegate.so

# USE_GPU_INFERENCE=0 python3 label_image.py

Note: As explained on the NXP's Application Note AN12964, the i.MX 8M Plus SoC requires an Warmup Time of about 7 seconds to initiate before delivering its expected high performance. You will observe this extra time when starting an application with NPU support.

Additional Resources

See the version-specific NXP's i.MX Machine Learning User's Guide for more information about eIQ enablement.

Need more help?

Ask the experts in our Community!