FluentD Troubleshooting Guide

Use the following steps to help with troubleshooting a FluentD configuration:

1. Use Rubular for testing regular expressions

FluentD uses the Ruby regex engine. For testing regex patterns against known logs, it is beneficial to take advantage of tools like Rubular.

2. View parsing errors in FluentD logs

When a message fails to parse, FluentD will log an error in:
/var/log/bindplane-log-agent/bindplane-fluentd.log

Failure to parse a log results in an error message that resembles the following:

d2019-11-06 13:20:59 -0500 [warn]: #0 pattern not match: "this log will 
fail to parse"

For more immediate feedback, FluentD can be run manually with the following command:

/opt/bluemedora/bindplane-log-agent/embedded/bin/fluentd  --config /etc/bindplane-log-agent/fluentd.conf

3. Ensure that logs are written to the file being tested

To avoid reading the same logs multiple times, FluentD keeps track of what has already been read. If the configuration is changed, it will not re-read logs it has already tried to process. The easiest way to get around this during testing is to continue writing logs to the file. Below is an example command that will write to the log file:

Replace /my/test/log with the actual path to your log file.

head -n 100 /my/test/log >> /my/test/log

4. Final steps after testing with Rubular

Once it is clear that the logs are being parsed correctly, remove any match blocks that were added for debugging. If the Google Stackdriver output is configured correctly, logs should start flowing to Stackdriver Logging. If no logs are flowing into Stackdriver, check the FluentD logs at:
/var/log/bindplane-log-agent/bindplane-fluentd.log

An example of testing logs with Rubular

Let's say we want to create a simple log parser for dbus messages that look like the following:

[   23.108190] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni)
[   23.205163] EDAC sbridge: Seeking for: PCI ID 8086:2fa0
[   23.205171] EDAC sbridge:  Ver: 1.1.1
[   23.618221] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[   23.827475] RPC: Registered named UNIX socket transport module.
[   23.827478] RPC: Registered udp transport module.
[   23.827479] RPC: Registered tcp transport module.
  1. Create a tail plugin that will monitor that file in /etc/bindplane-log-agent/config.d/input-dmesg.conf

In this example, the path is set to a copy of /var/log/dmesg is used so that we can append more logs to the file for testing. Below are the contents of the input-dmesg.conf configuration file:

<source>
  @type tail
  path /var/log/dmesg.copy
  tag dmesg
  <parse>
    @type regexp
    expression /\[   (?<time_since_boot>[\d\.]+)\] (?<message>.*)/
  </parse>
</source>
  1. Test the expression in the configuration, using Rubular, against the last line of /var/log/dmesg
    Last line: [ 23.827479] RPC: Registered tcp transport module.
Testing the expression using RubularTesting the expression using Rubular

Testing the expression using Rubular

In the image above, we can see the fields extracted from the log entry.

  • time_since_boot
  • message
    We will see these fields later within our log entries in Stackdriver as part of the jsonPayload.
  1. Restart FluentD with the following command: systemctl restart bindplane-fluentd
  2. Generate logs in /var/log/dmesg.copy with the following command: cat /var/log/dmesg >> /var/log/dmesg.copy
  3. Examine the FluentD logs with the following command: less +G /var/log/bindplane-log-agent/bindplane-fluentd.log

There should be a number of pattern not matched warnings similar to the following:

2019-11-06 15:05:53 -0500 [warn]: #0 pattern not match: "[    0.287568] pci_bus 0000:09: resource 1 [mem 0xfbd00000-0xfbdfffff]"
  1. Test the unmatched string with Rubular. In our example here, we will find that our expression is not handling spaces correctly.
Testing the `pattern not matched` string in RubularTesting the `pattern not matched` string in Rubular

Testing the pattern not matched string in Rubular

  1. Modify the regex pattern to match the strings that are failing and update the expression in the input configuration.
<source>
  @type tail
  path /var/log/dmesg.copy
  tag dmesg
  <parse>
    @type regexp
    expression /\[\s+(?<time_since_boot>[\d\.]+)\] (?<message>.*)/
  </parse>
</source>

Testing this expression using Rubular, we can see that our expression successfully parses the log entry.

Testing modified expression in RubularTesting modified expression in Rubular

Testing modified expression in Rubular

  1. Repeat steps #3, #4, and #5
    If the parsing was successful, the warning errors should be gone. If Stackdriver is configured correctly, there will be messages in Stackdriver Logging that resemble the following:
    2019-11-06 15:12:28 -0500 [info]: #0 Successfully wrote logs to stackdriver for the first time

How to examine the logs in Stackdriver Logging to verify everything is working properly.

  1. Navigate to Logs Viewer and verify that the correct project is selected.
  2. Select Generic Node from the resources drop-down menu
  3. Convert the filter to an advanced filter and the following to the filter, replacing <hostname> with the name of the host the logs are being sent from.
Generic Node in the resource menuGeneric Node in the resource menu

Generic Node in the resource menu

Convert to advanced filterConvert to advanced filter

Convert to advanced filter

resource.type="generic_node"
resource.labels.node_id="<hostname>"

The logs from our example above should now be visible in the Stackdriver Logging UI. If you examine the jsonPayload, you will see the fields time_since_boot and message are parsed with the names we defined in the regex pattern.

jsonPayload with the fields `time_since_boot` and `message`jsonPayload with the fields `time_since_boot` and `message`

jsonPayload with the fields time_since_boot and message