Search by Tags

How to use OpenCL 1.2 in iMX8 on Torizon

 

Article updated at 11 Sep 2020
Compare with Revision




Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release or cat /etc/issue on the board.

Torizon 5.0.0

Introduction

Torizon features a container runtime, Debian container images and a deb package repository that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized for GPUs available on NXP i.MX8/8X SoCs to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

Warning: as of February 2022, it was identified that this demo fails to run with a runtime error "clGetPlatformIDs (-1001) no platforms found".

Prerequisites

Note: Apalis iMX8X is phased out, and it is not available for purchase anymore. The latest supported BSP and TorizonCore version is 5.4.0.

Dockerfile explained

Image base

Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/wayland-base-vivante to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.

Building clpeak

Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.

Run clpeak

In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.

Complete Dockerfile

Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.

Dockerfile instructions

To get the most out of this article it is recommended you clone the source from our samples github repository.

$ git clone -b bullseye https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

To build

Inside the opencl directory that contains the Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run

First, pull it from your dockerhub account to the board. In the terminal of your board: Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
             <your-dockerhub-username>/opencl-image

Expected Output

This is the expected output from an Apalis iMX8 board:

Output
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.81
      float2  : 9.74
      float4  : 10.63
      float8  : 9.36
      float16 : 8.00
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.43
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 301.68
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 269.08
        memcpy to mapped ptr     : 1.43
 
    Kernel launch latency : 97.56 us
 
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.59
      float2  : 9.38
      float4  : 10.33
      float8  : 9.15
      float16 : 7.85
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.42
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 238.44
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 207.03
        memcpy to mapped ptr     : 1.44
 
    Kernel launch latency : 126.82 us

Torizon 4.0.0

Introduction

Torizon features Docker runtime. Toradex provides Debian Docker images and deb packages that greatly eases the development process for embedded applications. In this article, we will show how you can install and test OpenCL libraries optimized to iMX8 GPU to integrate into your application. We will also obtain, build, and run an OpenCL benchmarking tool to check if the software is set up correctly and observe the GPU performance.

Prerequisites

  • A Toradex's i.MX8 SoM with Torizon installed ( To get instructions about how to install Torizon, see the Quickstart Guide )
  • Basic knowledge of Docker containers. To learn more about Docker, visit the developer's website. To learn the first steps with Docker usage and Torizon, check the Quickstart Guide.

Dockerfile explained

Image base

Toradex provides a basic Wayland image in its dockerhub page. You need to add the torizon/arm64v8-debian-wayland-base-vivante to your image. It contains the repository package. You also need to get the Vivante's OpenCL Debian package for Torizon.

Building clpeak

Clpeak is a benchmarking tool to measure the peak capabilities of OpenCL devices. We will build it from source for our system.

Run clpeak

In this demo Dockerfile, we will run get the built clpeak from the previous stages and use it as an entry point.

Complete Dockerfile

Now the full Dockerfile implementation should look something like this. The next section will show how to build and run this as a container.

Dockerfile instructions

To get the most out of this article it is recommended you clone the source from our samples github repository.

$ git clone https://github.com/toradex/torizon-samples.git
$ cd torizon-samples/opencl

To build

Inside the opencl directory that contains the Dockerfile on the host PC, build the image:

$ docker build -t <your-dockerhub-username>/opencl-image .

After the build, push the image to your Dockerhub account:

$ docker push <your-dockerhub-username>/opencl-image

To run

First, pull it from your dockerhub account to the board. In the terminal of your board: Warning: These instructions assumes that the dockerhub credentials are already set up on the board. If you did not setup your credentials yet, execute docker login

# docker pull <your-dockerhub-username>/opencl-image

After the pull, run a container based on the image.

Attention: Please, note that by executing the following line you are accepting the NXP's terms and conditions of the End-User License Agreement (EULA)

# docker run -e ACCEPT_FSL_EULA=1 -it --rm --name=clpeak-container --net=host --cap-add CAP_SYS_TTY_CONFIG \
             -v /dev:/dev -v /tmp:/tmp -v /run/udev/:/run/udev/ \
             --device-cgroup-rule='c 4:* rmw'  --device-cgroup-rule='c 13:* rmw' --device-cgroup-rule='c 199:* rmw' --device-cgroup-rule='c 226:* rmw' \
             <your-dockerhub-username>/opencl-image

Expected Output

This is the expected output from an Apalis iMX8 board:

Output
Platform: Vivante OpenCL Platform
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.81
      float2  : 9.74
      float4  : 10.63
      float8  : 9.36
      float16 : 8.00
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.43
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 301.68
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 269.08
        memcpy to mapped ptr     : 1.43
 
    Kernel launch latency : 97.56 us
 
  Device: Vivante OpenCL Device GC7000XSVX.6009.0000
    Driver version  : OpenCL 1.2 V6.2.4.p4.190076 (Linux ARM64)
    Compute units   : 1
    Clock frequency : 996 MHz
 
    Global memory bandwidth (GBPS)
      float   : 5.59
      float2  : 9.38
      float4  : 10.33
      float8  : 9.15
      float16 : 7.85
 
    Single-precision compute (GFLOPS)
      float   : 14.14
      float2  : 28.18
      float4  : 55.87
      float8  : 62.15
      float16 : 61.45
 
    No half precision support! Skipped
 
    No double precision support! Skipped
 
    Integer compute (GIOPS)
      int   : 14.13
      int2  : 14.09
      int4  : 15.84
      int8  : 15.73
      int16 : 14.54
 
    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 1.42
      enqueueReadBuffer          : 0.08
      enqueueMapBuffer(for read) : 238.44
        memcpy from mapped ptr   : 0.08
      enqueueUnmap(after write)  : 207.03
        memcpy to mapped ptr     : 1.44
 
    Kernel launch latency : 126.82 us