Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release
or cat /etc/issue
on the board.
Torizon features a container runtime, Debian container images and a deb package repository that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized for GPUs available on NXP i.MX8/8X SoCs to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.
Warning: as of February 2022, it was identified that this demo fails to run with a runtime error "clGetPlatformIDs (-1001) no platforms found".
Note: Apalis iMX8X is phased out, and it is not available for purchase anymore. The latest supported BSP and TorizonCore version is 5.4.0.
Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/wayland-base-vivante
to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.
Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.
In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.
Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.
To get the most out of this article it is recommended you clone the source from our samples github repository.
$ git clone -b bullseye https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl
Inside the opencl
directory that contains the Dockerfile on the host PC, build the image:
$ docker build -t <your-dockerhub-username>/opencl-image .
After the build, push the image to your Dockerhub account:
$ docker push <your-dockerhub-username>/opencl-image
First, pull it from your dockerhub account to the board. In the terminal of your board:
Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login
# docker pull <your-dockerhub-username>/opencl-image
After the pull, run a container based on the image.
Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)
# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
-v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
--device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
<your-dockerhub-username>/opencl-image
This is the expected output from an Apalis iMX8 board:
OutputPlatform: Vivante OpenCL Platform Device: Vivante OpenCL Device GC7000XSVX.6009.0000 Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64) Compute units : 1 Clock frequency : 996 MHz Global memory bandwidth (GBPS) float : 5.81 float2 : 9.74 float4 : 10.63 float8 : 9.36 float16 : 8.00 Single-precision compute (GFLOPS) float : 14.14 float2 : 28.18 float4 : 55.87 float8 : 62.15 float16 : 61.45 No half precision support! Skipped No double precision support! Skipped Integer compute (GIOPS) int : 14.13 int2 : 14.09 int4 : 15.84 int8 : 15.73 int16 : 14.54 Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.43 enqueueReadBuffer : 0.08 enqueueMapBuffer(for read) : 301.68 memcpy from mapped ptr : 0.08 enqueueUnmap(after write) : 269.08 memcpy to mapped ptr : 1.43 Kernel launch latency : 97.56 us Device: Vivante OpenCL Device GC7000XSVX.6009.0000 Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64) Compute units : 1 Clock frequency : 996 MHz Global memory bandwidth (GBPS) float : 5.59 float2 : 9.38 float4 : 10.33 float8 : 9.15 float16 : 7.85 Single-precision compute (GFLOPS) float : 14.14 float2 : 28.18 float4 : 55.87 float8 : 62.15 float16 : 61.45 No half precision support! Skipped No double precision support! Skipped Integer compute (GIOPS) int : 14.13 int2 : 14.09 int4 : 15.84 int8 : 15.73 int16 : 14.54 Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.42 enqueueReadBuffer : 0.08 enqueueMapBuffer(for read) : 238.44 memcpy from mapped ptr : 0.08 enqueueUnmap(after write) : 207.03 memcpy to mapped ptr : 1.44 Kernel launch latency : 126.82 us
Torizon features Docker runtime. Toradex provides Debian Docker images and deb packages that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized to iMX8 GPU to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.
Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/arm64v8-debian-wayland-base-vivante
to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.
Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.
In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.
Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.
To get the most out of this article it is recommended you clone the source from our samples github repository.
$ git clone https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl
Inside the opencl
directory that contains the Dockerfile on the host PC, build the image:
$ docker build -t <your-dockerhub-username>/opencl-image .
After the build, push the image to your Dockerhub account:
$ docker push <your-dockerhub-username>/opencl-image
First, pull it from your dockerhub account to the board. In the terminal of your board:
Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login
# docker pull <your-dockerhub-username>/opencl-image
After the pull, run a container based on the image.
Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)
# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
-v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
--device-cgroup-rule='c 4:* rmw' --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
<your-dockerhub-username>/opencl-image
This is the expected output from an Apalis iMX8 board:
OutputPlatform: Vivante OpenCL Platform Device: Vivante OpenCL Device GC7000XSVX.6009.0000 Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64) Compute units : 1 Clock frequency : 996 MHz Global memory bandwidth (GBPS) float : 5.81 float2 : 9.74 float4 : 10.63 float8 : 9.36 float16 : 8.00 Single-precision compute (GFLOPS) float : 14.14 float2 : 28.18 float4 : 55.87 float8 : 62.15 float16 : 61.45 No half precision support! Skipped No double precision support! Skipped Integer compute (GIOPS) int : 14.13 int2 : 14.09 int4 : 15.84 int8 : 15.73 int16 : 14.54 Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.43 enqueueReadBuffer : 0.08 enqueueMapBuffer(for read) : 301.68 memcpy from mapped ptr : 0.08 enqueueUnmap(after write) : 269.08 memcpy to mapped ptr : 1.43 Kernel launch latency : 97.56 us Device: Vivante OpenCL Device GC7000XSVX.6009.0000 Driver version : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64) Compute units : 1 Clock frequency : 996 MHz Global memory bandwidth (GBPS) float : 5.59 float2 : 9.38 float4 : 10.33 float8 : 9.15 float16 : 7.85 Single-precision compute (GFLOPS) float : 14.14 float2 : 28.18 float4 : 55.87 float8 : 62.15 float16 : 61.45 No half precision support! Skipped No double precision support! Skipped Integer compute (GIOPS) int : 14.13 int2 : 14.09 int4 : 15.84 int8 : 15.73 int16 : 14.54 Transfer bandwidth (GBPS) enqueueWriteBuffer : 1.42 enqueueReadBuffer : 0.08 enqueueMapBuffer(for read) : 238.44 memcpy from mapped ptr : 0.08 enqueueUnmap(after write) : 207.03 memcpy to mapped ptr : 1.44 Kernel launch latency : 126.82 us