Contiguous Memory Allocator - CMA (Linux)

Applicable for

BSP Layers / Reference Images for Yocto (Linux)

BSP 5

Introduction

Some hardware blocks do memory access not through the memory management unit (MMU), that is, use the virtual address space, but rather access memory directly using its physical address. Examples are certain direct memory access (DMA) controllers or companion CPUs in a heterogeneous multicore processing (HMP) environment e.g. the Cortex M controllers we have in certain CPUs.

For these one needs to allocate memory blocks with contiguous physical addresses. Among the ways to solve this is the contiguous memory allocator (CMA). Other ways to solve the issue exist, one example is to do DMA in smaller junks, page-sized. Another example is to use scatter-gather, i.e. a list of smaller memory areas either in SW or HW, or to exclusively reserve or carveout memory at boot time and then manage the memory exclusively for e.g. DMA use, implement an input-output memory management unit (IOMMU).

CMA is a memory allocator within the kernel which allows allocating large chunks of memory with contiguous physical memory addresses.

Intended audience

There are two common reasons why you would want (or need) to configure the CMA size as described in this article:

If your application uses a big chunk of CMA - often due to GPU or VPU usage - and the available CMA is not enough, you have to increase it.
If you have a special application that does not need CMA and cannot use CMA as regular RAM, you can decrease the CMA size to free memory to your application. Note that this is a rare scenario, as most applications can use CMA as regular RAM. We have only seen related issues with containers in TorizonCore.

This article complies to the Typographic Conventions for the Toradex Documentation.

Prerequisites

BSP Layers and Reference Images for Yocto Project version 5.3.0 or newer.

CMA

CMA works by reserving a large memory area at boot time and immediately giving back the memory to the kernel memory subsystem with the constraint that memory can only be handed out either for CMA use or for movable pages. Thus if other users have claimed memory (e.g. buffer cache or whatever) the data can be moved in a way that the CMA area can be freed of fragmentation and large contiguous blocks can be handed out.

There is one CMA area for common use. Subsystems could create further CMA areas for their own use.

CMA must be enabled in the kernel config.

Configure the size of the CMA area

There are three ways to configure the CMA area size. The device tree overrules what’s on the kernel cmdline and the kernel cmdline overrules the kernel configuration.

By setting the CMA area in the device tree one additionally can set an address range within which the CMA area should be located.

In the device tree:

Add a node /reserved-memory/linux,cma, for example:

linux,cma {
  compatible = "shared-dma-pool";
  reusable;
  size = <0 0x3c000000>;
  alloc-ranges = <0 0x96000000 0 0x3c000000>;
  linux,cma-default;
};

On the kernel command line:

cma=256MB

In the kernel configuration:

CONFIG_CMA_SIZE_MBYTES=960
CONFIG_CMA_SIZE_PERCENTAGE=25
# CONFIG_CMA_SIZE_SEL_MBYTES is not set
# CONFIG_CMA_SIZE_SEL_PERCENTAGE is not set
CONFIG_CMA_SIZE_SEL_MIN=y
# CONFIG_CMA_SIZE_SEL_MAX is not set

TorizonCore

In TorizonCore, CMA area size can be set/modified with TorizonCore Builder via setting the kernel command line arguments.

For example:

torizoncore-builder kernel set_custom_args "cma=192MB"

Check the CMA reserved size in a running kernel

Filter the output of dmesg as follows:

# dmesg | grep cma
2[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
3[    0.000000] Memory: 242688K/524288K available (8192K kernel code, 357K rwdata, \
4   2904K rodata, 1024K init, 417K bss, 19456K reserved, 262144K cma-reserved, 0K highmem)

CMA use in the BSP

As of the BSP release 5.3.0, CMA is enabled in our Reference Images for Yocto Project. It is configured as follows:

All modules with 64-bit CPUs and all upstream kernels configure the CMA size through the kernel configuration.
All others (i.MX6/i.MX6ULL/i.MX7) configure it from the device tree, however, it is planned to change that to use the kernel config for that.

The needed size of the CMA area depends on what subsystem actually uses it. Users of big chunks are mainly the GPU and the VPU subsystem, i.e. 3D acceleration and video decoding/encoding.

Should the CMA area be too small to fulfill allocation requests, the kernel will print something like the following. This example has been provoked by setting CMA to 64MB and then playing a video:

[   38.419943] cma: cma_alloc: alloc failed, req-size: 4097 pages, ret: -12

Should you configure the CMA area size too big, memory allocation for memory that may not be movable may fail. For instance, we found that certain Torizon containers cannot start if one uses the default CMA area size of 640MB on a Verdin iMX8M Mini 1GB.

CMA size in the BSP

Module	Downstream Kernel	Upstream Kernel
Apalis iMX6	320MB	64MB
Apalis iMX8	max(256MB or 25% of DDR size) [1]	N/A
Apalis iMX8X	max(256MB or 25% of DDR size) [1]	N/A
Apalis TK1	N/A	64MB
Colibri iMX6	50% DDR size [2]	64MB
Colibri iMX6ULL	128MB	64MB
Colibri iMX7	max(256MB or 25% of DDR size)	64MB
Colibri iMX8X	max(256MB or 25% of DDR size) [1]	N/A
Verdin iMX8M Mini	max(256MB or 25% of DDR size) [1]	N/A
Verdin iMX8M Plus	max(256MB or 25% of DDR size) [1]	N/A

[1]: Changed to min(1376MB or 25% of DDR size) after 5.3.0 release.
[2]: Through device tree patching in U-Boot.

Note: Be aware that TorizonCore inherits CMA size default config from BSP.

Additional resources

"A deep dive into CMA" on LWN.net

Need more help?

Ask the experts in our Community!