Search by Tags

Thermal Management

 

Article updated at 30 Jul 2021
Compare with Revision




Introduction

Thermal management is the area concerned with making sure that the system and its components operate within defined temperature ranges, to guarantee the reliable operation of the whole system. It involves concepts such as power consumption, heat dissipation, and system temperature, which are related to the following topics:

  • Hardware: the heat generation is directly proportional to the power consumption of the device. Also, each different component has its specific operating temperature ranges and limits. Usually, one is concerned with the System on Chip (SoC) heat dissipation and the (System on Module) SoM operating temperature ranges. The optimal design of the carrier board may help reduce the power consumption of the system, which may reflect on a significant heat reduction.
  • Software: the power consumption - and thus the heat generation - of the system and its components are affected by the software load put on the hardware components, such as CPU cores, GPU, and peripherals. It is directly related to the use-case application but also affected by the BSP version and its fine-tuning.
  • Enclosing: the mechanical enclosing of the system - a box, for instance - must be taken into account when it comes to thermal management. A low-power SoM may fit a constrained enclosing, whereas a high-performance system may require a well-designed airflow or ventilation scheme for proper operation.
  • Environment: heat flux intensity depends on a difference of temperature, which in our case is the difference between the hardware - usually the SoC in this case - and the environment. Knowing the environment temperature range where the system will operate is essential for designing a thermal management solution that both satisfies the system requirements and uses the least complex cooling mechanism possible.

This article provides an overview of thermal management solutions applied to the Toradex system SoMs. It goes through the hardware and software BSP specifics as well as cooling solutions.

Hardware

This section goes through the hardware specifics related to thermal management.

Note: Additional information can be found in the respective Toradex SoMs datasheets, under the Thermal Specification section. The datasheets are available in the corresponding product pages, under the Datasheets tab.

SoC (CPU) Limits

The following table provides the maximum junction temperature for a specific SoC. Notice that this is the maximum temperature at the semiconductor level, measured by a sensor internal to the SoC die. This temperature is higher than the die/case temperature and is monitored by the underlying operating system and/or additional hardware mechanisms. If temperature throttling mechanisms fail to keep the SoC from reaching the junction temperature, a forced system shutdown is issued to prevent permanent damage.

Browse the dropdown table below for information about maximum junction temperature sorted by SoC:

Maximum Junction Temperature by SoC

SoM Limits

The operating temperature of all the electrical components in an SoM defines the final module specification. As a consequence, the most critical element regarding temperature limits may not be the SoC, but instead another part. The dropdown tables below list the operating temperature range by Toradex SoM:

Apalis Family

Maximum Operating Temperature by Apalis SoM

Colibri Family

Maximum Operating Temperature by Colibri SoM

Verdin Family

Maximum Operating Temperature by Verdin SoM

Software and BSP

This section goes through the software and BSP specifics related to thermal management.

Dynamic Voltage and Frequency Scaling (DVFS) and Thermal Throttling

Dynamic Voltage and Frequency Scaling (DVFS) is a mechanism in which the operating system optimizes power consumption by adjusting the CPU clocks and voltage based on demand. A side-effect of power consumption optimization is that the system generates less heat in workloads that don't make full use of the CPU.

Thermal Throttling is a mechanism implemented in the operating system to preserve the integrity of the processor. It forces reduction of the system clock when it reaches certain temperatures, independent of DVFS.

Note: DVFS is disabled by default on WinCE. Please see DVFS on Windows Embedded Compact for further information.

CPU Hotplug

It may be possible to enable/disable CPU cores dynamically if both the SoC and operating system support it, which saves power, thus generating less heat.

General tips

This section has some tips on how to save power, which may help reduce heat generation and other aspects of the software that may affect thermal management.

  • If using peak performance for a short duration, heat dissipation is not a matter of concern because of the advanced power management.
  • Cooling solutions may optimize system performance.
  • Colling solutions can be passive or active.
  • When the application requires full CPU / Graphics performance for a more extended period, a general recommendation is testing the system's thermal behavior in the given condition.
  • Always refer to the Thermal Specification section in the respective module datasheet.
  • Thermal throttling configuration, also referred to as temperature trip points, can be adjusted in the BSP.

Note: We recommend the measuring of the system's power consumption, before and after making the changes. It helps in getting a better understanding of the power management of the system.

OS Specific Guidelines

Choose your OS from the tabs below:

Linux

Using Linux, DVFS can be disabled, and the CPU frequency manually set. See the CPU Frequency (Linux) article.

An application in the userspace can monitor the temperature. How to read it and which sensors are available is module-dependent. See the Temperature Sensor (Linux) article and Apalis/Colibri T30 Temperature Monitoring for additional information.

The Linux kernel executes the Thermal Throttling, and the generic Thermal Sysfs API provides access to its settings. The section below includes information about how to set temperature trip points in Linux:

How to Set Temperature Trip Points

There are two temperature trip points used on iMX SoCs.

passive This is the point where Linux starts to throttle the CPU.

critical This is the point where Linux shuts itself down to protect the CPU.

Toradex decided to use the T_junction_max stated in the datasheet for the critical temperature and 10°C less for the passive trip point.

The following patch should be a guideline how to change these trip-points for iMX related SoCs:

i.MX 6(ULL) / i.MX 7
diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index 28072a7..591d6be 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -656,10 +656,10 @@ static int imx_get_sensor_data(struct platform_device *pdev)
     }

     /*
-     * Set the critical trip point at 5C under max
+     * Set the critical trip point at max
      * Set the passive trip point at 10C under max (can change via Sysfs)
      */
-    data->temp_critical = data->temp_max + (1000 * 10);
+    data->temp_critical = data->temp_max;
     data->temp_passive = data->temp_max - (1000 * 10);

     return 0;

data->temp_max in this driver is used for the T_junction_max that is read out from the fuses.

data->temp_passive and data->temp_critical are the temperatures described above that a developer should set with the desired temperature in milli-degree Celsius.

i.MX 8/8X/8M Mini/8M Plus

The i.MX 8/8X power management and temperature monitoring are entirely handled by the System Controller Firmware (SCFW).

The critical and passive points threshold set in their dtsi file and specify CONFIG_IMX8M_THERMAL in defconfig.

The thermal driver can be accessed through the following interface:

  • /sys/class/thermal/thermal_zoneX for i.MX 8 and i.MX 8X.
  • /sys/class/thermal/thermal_zone0 for i.MX 8M Mini.
  • /sys/class/thermal/thermal_zoneX for i.MX 8M Plus.

Inside the above directory, there are files named trip_point*. There you can read the type, current temperature, and hysteresis used by those trip points.

CPU Hotplug

See the article CPU (Linux) for detailed information on supported modules.

Additional Tips and Recommendations

  • Disable unused Display Interfaces

  • Use a Lower Frequency

    • See the CPU Frequency (Linux) article to change the CPU frequency to test system performance and power consumption.
  • Avoid Toggling Pins

    • Make sure none of the pins are unnecessarily toggling. Also, make sure all input pins are in a defined state. The GPIO Tool will be helpful in testing, and the Device Tree Customization helps on tweaking the SoC pins configuration for device tree enabled modules.
  • Use Low Power Modes

    • Enter Suspend mode during idle time or even consider switching off the module completely. See the Suspend/Resume (Linux) article for reference.
  • Check CPU Load

    • Linux has many tools to monitor CPU load, such as top, htop, etc. If this value is unexpectedly high, then check the application software. Some easy modifications may help to lower the CPU load. e.g., use interrupts instead of polling, sleep instead of busy waits, etc.
  • Disable unused Drivers

    • For this step, you should measure the power consumption; in some cases, disabling drivers may negatively impact on power consumption. For this purpose, you may have to recompile the Linux kernel and modules. See the article Build U-Boot and Linux Kernel from Source Code for reference.

WinCE

How to Set Temperature Trip Points

DVFS and temperature throttling settings can be customized. See the Resource Manager Registry Settings and Apalis/Colibri iMX6 DVFS on Windows Embedded Compact articles. As an alternative to tweaking resources, as well as a means to monitor system frequency and temperature, Toradex provides a software tool named Toradex Task Manager. Notice that DVFS is available from WinCE Image 1.3b4 onwards.

For additional temperature monitoring information, see Apalis/Colibri T30 Temperature Monitoring and SoC Temperature Readout (WinCE).

CPU Hotplug

See the article Resource Manager Registry Settings for detailed information on supported modules.

Additional Tips and Recommendations

  • Disable unused Display Interfaces

    • The Tegra modules have three display interfaces, make sure only you only enable the specific display interface for your application and disable the unused ones. Please use the Tegra specific registry settings, as mentioned in this article.
BootupStyle = 1
HDMIHotplugBehavior = 2
  • Use a Lower Frequency

    • One can use the Toradex Task Manager to change the CPU frequency to test system performance and power consumption.
  • Avoid Toggling Pins

    • Make sure none of the pins are unnecessarily toggling. Also, make sure all input pins are on a defined voltage level. The GPIO Config tool will be helpful in testing.
  • Use Low Power Modes

    • Enter Suspend mode during idle time or even switch off the module completely. Toradex WinCE images offer fast boot (in some cases, boot time is less than 0.5 Seconds).
  • Check CPU Load

    • Use Toradex Task Manager to check the CPU Load. If the load is unexpectedly high, then check the application software. Some easy modifications may help to lower the CPU load. e.g., use interrupts instead of polling, sleep instead of busy waits, etc.
  • Disable unused Drivers

    • For this step, you should measure the power consumption; in some cases, disabling drivers may negatively impact on power consumption. To learn more about how to disable drivers, please refer to this article.

Mechanical Considerations - Cooling Solutions

Colling solutions usually target the SoC and can be either passive or active. Passive means that the natural convection transports the heat from the surface to the air. By passive definition, it includes both having the SoC exposed to the environment with or without a heatsink. The efficiency of natural convection is dependent on the housings and the environment. This solution has no moving parts and does not produce noise. If the passive cooling is not sufficient, the most common active cooling solution for embedded systems is the use of a DC fan on top of the heat sink, which increases efficiency dramatically.

if a box encloses the hardware, there are two recommended approaches: The design of the enclosure should optimize the airflow The enclosure should thermally couple to the SoC ( or both).

The temperature inside the enclosing has to respect the SoM operating temperature range.

Apalis

The Apalis family has a robust, rigid mounting mechanism to support thermal solutions. It is ready-to-use on Toradex carrier boards and, if you plan to design your carrier board, the thermal solution implementation guidelines are available in the Apalis Carrier Board Design Guide.

The optimized Apalis Heatsink is available for each version of the Toradex Apalis module. The following table shows the compatibility of the available Apalis heatsinks:

Apalis Heatsink Type Compatible Module
Type 1 Apalis iMX6Q IT
Apalis iMX6D IT
Type 2 Apalis T30
Type 3 Apalis iMX6Q
Apalis iMX6D
Apalis TK1
Type 4 Apalis iMX8QM

The Apalis heatsink has four holes intended for mounting a fan on top of it. Specifics are available in the Apalis Heatsink Fan article. Also, a 3D CAD model of the heatsink is available in the 3D CAD models page.


  • Apalis Heatsink

    Apalis Carrier Boards Heatsink

Colibri

The Colibri family of SoMs does not have a cooling solution officially provided by Toradex. Nevertheless, we have tested a few off-the-shelf heatsink solutions available in the market with Colibri T20 and T30 modules. For more details, please refer to the following test reports:

Verdin

The Verdin family has a robust, rigid mounting mechanism to support thermal solutions. It is ready-to-use on Toradex carrier boards and, if you plan to design your carrier board, the thermal solution implementation guidelines are available in the Verdin Carrier Board Design Guide.

The optimized Verdin Industrial Heatsink is available for each version of the Toradex Verdin module. The following table shows the compatibility of the available Verdin Industrial Heatsink:

Verdin Heatsink Type Compatible Module
Type 1 Verdin iMX8M Mini
Verdin iMX8M Plus

  • Verdin Industrial Heatsink

    Verdin Industrial Heatsink

Legacy Information

Colibri PXAxxx

Colibri PXAxx modules run at a fixed frequency. Toradex provides ways to manually change the system frequency to tweak or optimize the system performance using software configurations. In most of the use cases, a cooling solution is not necessary. The maximum temperature is the case temperature of the PXA processor, which must not exceed 85°C. For more details, please refer to the respective Colibri module datasheet and Marvell's EMTS.