How to use OpenCL 1.2 in iMX8 on Torizon

Applicable for

Apalis iMX8 | Colibri iMX6ULL | Colibri iMX8X

Torizon 5.0.0

Introduction

Torizon features a container runtime, Debian container images and a deb package repository that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized for GPUs available on NXP i.MX8/8X SoCs to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

Warning: as of February 2022, it was identified that this demo fails to run with a runtime error "clGetPlatformIDs (-1001) no platforms found".

Prerequisites

A Toradex Apalis iMX8, Apalis iMX8X or Colibri iMX8X SoM with Torizon installed.
Basic knowledge of containers.
- Toradex provides a list of related articles
- You can also refer to the Docker documentation.

Note: Apalis iMX8X is phased out, and it is not available for purchase anymore. The latest supported BSP and TorizonCore version is 5.4.0.

Dockerfile explained

Image base

Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/wayland-base-vivante to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.

Building clpeak

Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.

Run clpeak

In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.

Complete Dockerfile

Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.

Dockerfile instructions

To get the most out of this article it is recommended you clone the source from our samples github repository.

$ git clone -b bullseye https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

To build

Inside the opencl directory that contains the Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run

First, pull it from your dockerhub account to the board. In the terminal of your board: Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
             <your-dockerhub-username>/opencl-image

Expected Output

This is the expected output from an Apalis iMX8 board:

 Output
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.81
      float2  : 9.74
      float4  : 10.63
      float8  : 9.36
      float16 : 8.00
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.43
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 301.68
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 269.08
        memcpy to mapped ptr     : 1.43
 
    Kernel launch latency : 97.56 us
 
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.59
      float2  : 9.38
      float4  : 10.33
      float8  : 9.15
      float16 : 7.85
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.42
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 238.44
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 207.03
        memcpy to mapped ptr     : 1.44
 
    Kernel launch latency : 126.82 us

Torizon 4.0.0

Introduction

Torizon features Docker runtime. Toradex provides Debian Docker images and deb packages that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized to iMX8 GPU to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

Prerequisites

A Toradex's i.MX8 SoM with Torizon installed ( To get instructions about how to install Torizon, see the Quickstart Guide )
Basic knowledge of Docker containers. To learn more about Docker, visit the developer's website. To learn the first steps with Docker usage and Torizon, check the Quickstart Guide.

$ git clone https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

To build

Inside the opencl directory that contains the Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
             <your-dockerhub-username>/opencl-image

Expected Output

This is the expected output from an Apalis iMX8 board:

 Output
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.81
      float2  : 9.74
      float4  : 10.63
      float8  : 9.36
      float16 : 8.00
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.43
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 301.68
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 269.08
        memcpy to mapped ptr     : 1.43
 
    Kernel launch latency : 97.56 us
 
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.59
      float2  : 9.38
      float4  : 10.33
      float8  : 9.15
      float16 : 7.85
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.42
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 238.44
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 207.03
        memcpy to mapped ptr     : 1.44
 
    Kernel launch latency : 126.82 us

Need more help?

Ask the experts in our Community!

How to use OpenCL 1.2 in iMX8 on Torizon

Contents

Torizon 5.0.0

Introduction

Prerequisites

Dockerfile explained

Image base

Building clpeak

Run clpeak

Complete Dockerfile

Dockerfile instructions

To build

To run

Expected Output

Torizon 4.0.0

Introduction

Prerequisites

Dockerfile explained

Image base

Building clpeak

Run clpeak

Complete Dockerfile

Dockerfile instructions

To build

To run

Expected Output

See also

Need more help?