Search by Tags

Device Monitoring in TorizonCore

 

Article updated at 24 Aug 2021

Select the version of your OS from the tabs below. If you don't know the version you are using, run the command cat /etc/os-release or cat /etc/issue on the board.



Remember that you can always refer to the Torizon Documentation, there you can find a lot of relevant articles that might help you in the application development.

Torizon 5.4.0

Introduction

Device monitoring with Torizon encompasses several different areas of functionality related to understanding the health, status, and performance of your devices.

When we think about device monitoring, we break it down into three types of data: metrics, logs, and alerts. A metric is a numerical value that we can measure and report on a regular interval over time, like memory usage, CPU temperature, or custom data from our users' applications. Logs are just the log output that various parts of the system produce, including docker container logs, kernel logs, and application logs from journald. Alerts are special events or errors that the device wants to raise in real-time because they require attention or remediation, like when a critical application fails to launch, or the device is running out of storage space.

Starting in TorizonCore 5.3, a monitoring agent is available that can collect all three types of data--metrics, logs, and alerts--and send it either to the Torizon Platform Services Web Interface, or independently to other external services. Today, the Torizon Platform supports metrics of all types. Out of the box, TorizonCore will report some basic system info, but you can also create and send your own custom metrics to build dashboards showing whatever data is most important to you. Log forwarding and real-time alerting are not available yet, but are planned for the future.

When investigating the best option for the monitoring agent in TorizonCore, we looked for an option that would be modular, event-driven, based on known and widely adopted standards, open-source, and with acceptable performance and resource usage on resource-constrained devices. Additionally, we wanted it to be flexible enough to handle all three types of device monitoring data. After considering all the options, we chose Fluent Bit as our monitoring agent.

Fluent Bit

Fluent Bit is an open-source log processor and forwarder, which allows to collect any data like metrics and logs from different sources (hardware and software), enrich them with filters, and send them to multiple destinations.

In Fluent Bit, information is processed in a pipeline, with a very pluggable architecture. Data is collected with input plugins, filtered with filter plugins, and sent to remote servers with output plugins:

  • Input plugins: gather and parse information from different sources (CPU, disk, memory, network, temperature, processes, kernel, logs, etc).
  • Filter plugins: allow altering the data before delivering it to some destination (remove, add, change, nest, etc).
  • Output plugins: allow defining a destination for the data (Prometheus, Amazon, Azure, Google Cloud, Datadog, Elasticsearch, HTTP(S), Kafka, etc).

Fluent Bit is written in C and designed with performance in mind (high throughput with low CPU and memory usage).

For more information about the project, see the Fluent Bit official documentation.

This article complies to the Typographic Conventions for the Toradex Documentation.

Tip: In this article, you will need to execute the commands as root. You can log in as root or (better) use the sudo command when logged in with a regular user to do it.

Prerequisites

Device monitoring implementation in TorizonCore

Fluent Bit is integrated and enabled in TorizonCore 5.4.0 and later versions. By default, it's configured to monitor CPU, memory, temperature, and the docker daemon, and send the information to the Torizon Platform.

When you first boot TorizonCore, the Fluent Bit service will not start due to the absence of the /etc/fluent-bit/enabled file:

# systemctl status fluent-bit
* fluent-bit.service - Fluent Bit
     Loaded: loaded (/usr/lib/systemd/system/fluent-bit.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Tue 2021-09-14 07:42:15 UTC; 5h 19min ago
             `- ConditionPathExists=/etc/fluent-bit/enabled was not met

Sep 14 07:42:15 colibri-imx6-10492785 systemd[1]: Condition check resulted in Fluent Bit being skipped.

As soon as the device is provisioned to the Torizon Platform, the provisioning script will enable the Fluent Bit service by creating the /etc/fluent-bit/enabled file (unless you choose to disable it). After that, Fluent Bit will start collecting data and sending it to the Torizon Platform. The data transport is secured passing it through Aktualizr-Torizon, so it takes advantage of the mutual TLS connection that software updates use.

Enabling device monitoring in TorizonCore 5.4.0 and later versions

Device monitoring is already enabled by default in TorizonCore 5.4.0 and later versions. The only thing you need to do is provision the device to the Torizon Platform Services and make sure the "Enable device metrics" box is checked. No extra steps are required.

In case your device has been provisioned prior to TorizonCore 5.4.0 and you have updated it using the Torizon Platform, you may need to create the file /etc/fluent-bit/enabled manually and restart the service:

# sudo touch /etc/fluent-bit/enabled
# sudo systemctl restart fluent-bit

Enabling device monitoring in TorizonCore 5.3.0

TorizonCore 5.3.0 has all the required infrastructure for device monitoring, but it is not enabled by default.

To enable device monitoring in TorizonCore 5.3.0, the first step is to provision the device to the Torizon Platform.

Then you have to create a Fluent Bit configuration file:

/etc/fluent-bit/fluent-bit.conf
[SERVICE]
    flush        1
    daemon       Off
    log_level    info
    parsers_file parsers.conf
    plugins_file plugins.conf
 
[INPUT]
    name         cpu
    tag          cpu
    interval_sec 300
 
[FILTER]
    Name       nest
    Match      cpu
    Operation  nest
    Wildcard   *
    Nest_under cpu
 
[INPUT]
    name         mem
    tag          memory
    interval_sec 300
 
[FILTER]
    Name       nest
    Match      memory
    Operation  nest
    Wildcard   *
    Nest_under memory
 
[INPUT]
    name         thermal
    tag          temperature
    name_regex   thermal_zone0
    interval_sec 300
 
[FILTER]
    Name       nest
    Match      temperature
    Operation  nest
    Wildcard   *
    Nest_under temperature
 
[INPUT]
    name         proc
    proc_name    dockerd
    tag          proc_docker
    fd           false
    mem          false
    interval_sec 300
 
[FILTER]
    Name       nest
    Match      proc_docker
    Operation  nest
    Wildcard   *
    Nest_under docker
 
[OUTPUT]
    name   tcp
    port   8850
    format json_lines
    match  *

Enable the Fluent Bit service:

# systemctl enable fluent-bit
# systemctl restart fluent-bit

And configure the data proxy in Aktualizr-Torizon:

# mkdir -p /etc/systemd/system/aktualizr-torizon.service.d/
# sh -c 'echo "[Service]" > /etc/systemd/system/aktualizr-torizon.service.d/override.conf'
# sh -c 'echo "Environment=\"AKTUALIZR_CMDLINE_PARAMETERS=--enable-data-proxy\"" >> /etc/systemd/system/aktualizr-torizon.service.d/override.conf'
# systemctl daemon-reload
# systemctl restart aktualizr-torizon

Now the device should be ready to send monitoring data to the Torizon Platform.

Enabling device monitoring in TorizonCore 5.2.0 and earlier versions

Due to the missing infrastructure to support device monitoring, it's not possible to enable it in TorizonCore 5.2.0 and earlier versions.

Disabling device monitoring

Disabling device monitoring can be easily done by simply disabling the Fluent Bit service:

# systemctl stop fluent-bit
# systemctl disable fluent-bit

Customizing device metrics for Torizon Platform

TorizonCore ships with a default configuration file for Fluent Bit that allows it to collect four basic metrics:

  • CPU usage
  • Memory/swap usage
  • CPU core temperature
  • Docker daemon status

However, you can modify this default configuration to send customized metrics, such as metrics from your own applications or from sensors connected to your board.

The Torizon Platform can accept metrics of any kind, as long as they are formatted properly. In this section, you'll learn how to add a new input plugin to fluent bit, add a filter plugin that formats the data for the Torizon Platform, and start sending data. If your use case isn't covered here, you can always consult the official documentation of Fluent Bit to learn how to do more.

The default Fluent Bit configuration (shown above and available on your board in /etc/fluent-bit/fluent-bit.conf) has four key parts:

  • A [SERVICE] section containing basic Fluent Bit options
  • Several [INPUT] sections enabling input plugins
  • Several [FILTER] sections formatting the data that those input plugins produce
  • An [OUTPUT] section that tells Fluent Bit how to send the data to Aktualizr-Torizon

To add custom metrics, we don't need to change [SERVICE] or [OUTPUT]; we just need to add a new input plugin and filter. We need to set up our config file so that we get JSON-formatted output that looks like this:

{
  "custom": {
    "my_metric_1": 123.4,
    "my_metric_2": 567.8
  }
}

The specific requirements are:

  • The nested object must be named custom
  • It may contain any number of name/value pairs
  • For each name/value pair, the value must be a number--i.e., no nested objects, strings, arrays, or booleans
  • The name of the name/value pair will be the name of the metric that appears on the Torizon Platform Services Web Interface

The simplest way to do this is with an input plugin that accepts raw JSON as an input, like the HTTP input plugin, and nest it under the custom key with the Nest filter plugin.

Try adding the following to your /etc/fluent-bit/fluent-bit.conf, just above the [OUTPUT] block, then restart Fluent Bit (systemctl restart fluent-bit):

[INPUT]
    name http
    host localhost
    port 9999

[FILTER]
    Name       nest
    Match      custom
    Operation  nest
    Wildcard   *
    Nest_under custom

This will make Fluent Bit listen on port 9999 for HTTP POST requests, and any custom metrics you send to http://localhost/custom will be nested inside the custom object--exactly what we need.

You can try it out by manually sending some JSON data to that port using curl. For example, the following curl command will send exactly the metrics given in the example json above:

# curl -X POST http://localhost:9999/custom \
     -H 'Content-Type: application/json' \
     -d '{"my_metric_1":123.4,"my_metric_2":567.8}'

This is just one basic example. As long as you have an input plugin that gives you the data you want, and a filter (or series of filters) to put the data into the format required, you can build in all kinds of metric reporting. A few other inputs that might be of interest:

  • Serial interface for pulling data from a simple serial connection
  • Process Metrics for reporting on one particular process
  • Exec to periodically run a command and parse the output

Each of these will require some kind of filtering to get the data into the required format; for more information consult the Fluent Bit official documentation.

Example: Disk usage custom metric

One common metric to monitor is disk usage. Fluent bit doesn't offer an input plugin for this, but it's easy enough to get using the df command, the Exec input plugin, and a simple filter.

df -k gives us the total, used, and available space in kilobytes:

# df -k
Filesystem                 1K-blocks    Used Available Use% Mounted on
tmpfs                        1898992   25824   1873168   2% /run
devtmpfs                     1384624       0   1384624   0% /dev
/dev/disk/by-label/otaroot  15226800 1687972  12745640  12% /sysroot

Adding a | grep otaroot gives us just the disk we're looking for, and either awk or jq can give us the rest of what we need to parse that into JSON:

# jq example
df -k | grep otaroot | jq -R -c -s 'gsub(" +"; " ") | split(" ") | { "otaroot_total": .[1], "otaroot_used":  .[2], "otaroot_avail":  .[3]}'

# awk example
df -k | grep otaroot | awk '{print "{\"otaroot_total\":" $2 ",\"otaroot_used\":" $3 ",\"otaroot_avail\":" $4 "}"}'

Now add one of these commands to /etc/fluent-bit.conf using the Exec input plugin and the Nest filter plugin. Disk usage doesn't change that frequently, so we'll only report these metrics once per hour by setting Interval_Sec to 3600.

[INPUT]
    Name          exec
    Tag           disksize
    Command       df -k | grep otaroot | jq -R -c -s 'gsub(" +"; " ") | split(" ") | { "otaroot_total": .[1], "otaroot_used":  .[2], "otaroot_avail":  .[3]}'
    Parser        json
    Interval_Sec  3600

[FILTER]
    Name       nest
    Match      disksize
    Operation  nest
    Wildcard   *
    Nest_under custom

Restart the fluent-bit service:

# systemctl restart fluent-bit

Your device will now be reporting metrics named otaroot_total, otaroot_used, and otaroot_avail to the Torizon Platform, and you can begin creating custom charts with those metrics.

Connecting Fluent Bit to other data platforms

Fluent Bit is a flexible tool that can send data to various other platforms. It is compatible with the most popular Cloud providers and protocols (AWS, Microsoft Azure, Google Cloud, Datadog, Elasticsearch, etc). For more information about how to configure Fluent Bit, see the official documentation. Note that Fluent Bit can output to multiple different sources, if you wish, so you can send device metrics to the Torizon Platform for monitoring, and use another platform for log storage, for example.

Application Log Monitoring

As mentioned above, logs are another important type of device monitoring. Although Torizon Platform does not yet support log aggregation, you can still use Fluent Bit to process and forward various types of logs, using another data sink for analyzing the data.

One common use case for TorizonCore is the monitoring of containers running user applications. Combined with the flexibility of Fluent Bit, this enables use cases like data aggregation, diagnostics, or basic monitoring of applications running on Torizon. The only requirement for this setup is that your applications must be running within a Docker container.

Warning: Application log monitoring not yet available on the Torizon Platform Services Web Interface. Therefore the receiving and processing of this data are up to the customer implementation.

Enabling Container Monitoring

To monitor Docker containers, we utilize the built-in Fluentd logging driver in Docker. This allows the logs for Docker Containers to be sent to the Fluentd collector as structured log data. Fluentd is an open-source data collector for unified logging, similar to Fluent Bit.

In order to enable Fluentd logging for Docker containers, there are two methods:

  1. System-wide - setting Fluentd as the default logging driver for all Docker containers: you must edit the default Docker configuration. On TorizonCore this config file is located at /etc/docker/daemon.json, you may need to create this file if it does not already exist. To enable Fluentd logging via config file please consult the Docker documentation, to see the correct syntax as well as possible options for the config file.
  2. Per-container basis - adding an additional flag to your container start up method: you can use it with either docker run or docker-compose. For docker run you need just add --log-driver=fluentd. For docker-compose consult the Docker documentation. This method is useful if you only want to monitor specific containers.

Configuring Fluent Bit for Container Monitoring

Now Fluent Bit must be configured correctly to accept the data coming from the Fluentd logger in Docker. Fortunately Fluentd and Fluent Bit are compatible and complementary tools. All that's needed is to utilize the "Forward" input plugin for Fluent Bit. This input plugin is designed to accept data from a Fluentd stream. The only thing to be careful of is to make sure that both Docker and Fluent Bit are configured to use the same TCP ports.

As for output streams, as mentioned above our Torizon Platform does not yet display or accept log data, though this feature is planned for the future. Therefore how the data is received and processed is up to user discretion.

Example

Open 2 terminals to your TorizonCore device. On the first terminal start up Fluent Bit:

# fluent-bit -i forward -o stdout -p format=json_lines -f 1

For demonstration, the output will just be set to stdout on the device.

Next, on the second terminal run a container like so:

# docker run --log-driver=fluentd debian echo "Testing a log message"

You should now see the following output in the first terminal running Fluent bit:

{"date":1636585969.0,"source":"stdout","log":"Testing a log message","contain                                                                                er_id":"eaf3a3e80ba3dd778ffd4c1057481cd889cbde75ed7b94ce3dddd5cc462a7c98","co                                                                                ntainer_name":"/gracious_driscoll"}

Tip: the Fluentd Docker logging driver only captures data from the container's stdout or stderr. Keep this in mind for containers that you would like to monitor with this method.

Webinars

Toradex has presented webinars about Device Monitoring and you can watch them on demand.

Secure Device Monitoring - Check Health, Resources and Performance

Learn more about this webinar on the landing page, or watch it below: