FluentD Troubleshooting Guide
Use the following steps to help with troubleshooting a FluentD configuration:
1. Use Rubular for testing regular expressions
FluentD uses the Ruby regex engine. For testing regex patterns against known logs, it is beneficial to take advantage of tools like Rubular.
2. View parsing errors in FluentD logs
When a message fails to parse, FluentD will log an error in:
/var/log/bindplane-log-agent/bindplane-fluentd.log
Failure to parse a log results in an error message that resembles the following:
d2019-11-06 13:20:59 -0500 [warn]: #0 pattern not match: "this log will
fail to parse"
For more immediate feedback, FluentD can be run manually with the following command:
/opt/bluemedora/bindplane-log-agent/embedded/bin/fluentd --config /etc/bindplane-log-agent/fluentd.conf
3. Ensure that logs are written to the file being tested
To avoid reading the same logs multiple times, FluentD keeps track of what has already been read. If the configuration is changed, it will not re-read logs it has already tried to process. The easiest way to get around this during testing is to continue writing logs to the file. Below is an example command that will write to the log file:
Replace /my/test/log
with the actual path to your log file.
head -n 100 /my/test/log >> /my/test/log
4. Final steps after testing with Rubular
Once it is clear that the logs are being parsed correctly, remove any match
blocks that were added for debugging. If the Google Stackdriver output is configured correctly, logs should start flowing to Stackdriver Logging. If no logs are flowing into Stackdriver, check the FluentD logs at:
/var/log/bindplane-log-agent/bindplane-fluentd.log
An example of testing logs with Rubular
Let's say we want to create a simple log parser for dbus messages that look like the following:
[ 23.108190] alg: No test for __generic-gcm-aes-aesni (__driver-generic-gcm-aes-aesni)
[ 23.205163] EDAC sbridge: Seeking for: PCI ID 8086:2fa0
[ 23.205171] EDAC sbridge: Ver: 1.1.1
[ 23.618221] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[ 23.827475] RPC: Registered named UNIX socket transport module.
[ 23.827478] RPC: Registered udp transport module.
[ 23.827479] RPC: Registered tcp transport module.
- Create a tail plugin that will monitor that file in
/etc/bindplane-log-agent/config.d/input-dmesg.conf
In this example, the path
is set to a copy of /var/log/dmesg
is used so that we can append more logs to the file for testing. Below are the contents of the input-dmesg.conf
configuration file:
<source>
@type tail
path /var/log/dmesg.copy
tag dmesg
<parse>
@type regexp
expression /\[ (?<time_since_boot>[\d\.]+)\] (?<message>.*)/
</parse>
</source>
- Test the
expression
in the configuration, using Rubular, against the last line of/var/log/dmesg
Last line:[ 23.827479] RPC: Registered tcp transport module.
In the image above, we can see the fields extracted from the log entry.
time_since_boot
message
We will see these fields later within our log entries in Stackdriver as part of thejsonPayload
.
- Restart FluentD with the following command:
systemctl restart bindplane-fluentd
- Generate logs in
/var/log/dmesg.copy
with the following command:cat /var/log/dmesg >> /var/log/dmesg.copy
- Examine the FluentD logs with the following command:
less +G /var/log/bindplane-log-agent/bindplane-fluentd.log
There should be a number of pattern not matched
warnings similar to the following:
2019-11-06 15:05:53 -0500 [warn]: #0 pattern not match: "[ 0.287568] pci_bus 0000:09: resource 1 [mem 0xfbd00000-0xfbdfffff]"
- Test the unmatched string with Rubular. In our example here, we will find that our expression is not handling spaces correctly.
- Modify the regex pattern to match the strings that are failing and update the
expression
in the input configuration.
<source>
@type tail
path /var/log/dmesg.copy
tag dmesg
<parse>
@type regexp
expression /\[\s+(?<time_since_boot>[\d\.]+)\] (?<message>.*)/
</parse>
</source>
Testing this expression using Rubular, we can see that our expression successfully parses the log entry.
- Repeat steps #3, #4, and #5
If the parsing was successful, the warning errors should be gone. If Stackdriver is configured correctly, there will be messages in Stackdriver Logging that resemble the following:
2019-11-06 15:12:28 -0500 [info]: #0 Successfully wrote logs to stackdriver for the first time
How to examine the logs in Stackdriver Logging to verify everything is working properly.
- Navigate to Logs Viewer and verify that the correct project is selected.
- Select Generic Node from the resources drop-down menu
- Convert the filter to an advanced filter and the following to the filter, replacing
<hostname>
with the name of the host the logs are being sent from.
resource.type="generic_node"
resource.labels.node_id="<hostname>"
The logs from our example above should now be visible in the Stackdriver Logging UI. If you examine the jsonPayload
, you will see the fields time_since_boot
and message
are parsed with the names we defined in the regex pattern.
Updated over 4 years ago