- Introduction
- Deployment
- Configuration
- 23. Configuration overview
- 24. NXLog Language
- 25. Reading and Receiving Logs
- 26. Processing Logs
- 26.1. Parsing Various Formats
- 26.2. Alerting
- 26.3. Using Buffers
- 26.4. Character Set Conversion
- 26.5. Detecting a Dead Agent or Log Source
- 26.6. Event Correlation
- 26.7. Extracting data
- 26.8. Filtering Messages
- 26.9. Format Conversion
- 26.10. Log Rotation and Retention
- 26.11. Message Classification
- 26.12. Parsing Multi-Line Messages
- 26.13. Rate Limiting and Traffic Shaping
- 26.14. Rewriting and Modifying Messages
- 26.15. Timestamps
- 27. Forwarding and Storing Logs
- 28. Centralized Log Collection
- 29. NXLog Failover Mode
- 30. High Availability
- 31. Encrypted Transfer
- 32. Reducing Bandwidth and Data Size
- 33. Reliable Message Delivery
- 34. Compression and Encryption
- OS Support
- Integration
- Troubleshooting
- Enterprise Edition Reference Manual
- NXLog Manager
- NXLog Add-Ons
26.7. Extracting data
When NXLog receives an event, it creates an event record with
a $raw_event
field, other core fields like
$EventReceivedTime, and any fields
provided by the particular module (see Fields for more
information). This section explores the various ways that
NXLog can be configured to extract values from the raw event.
Some log sources (like Windows Event Log collected via im_msvistalog) already contain structured data. In this case, there is often no additional extraction required; see Message Classification.
26.7.1. Regular expressions via the Exec directive
NXLog supports the use of regular expressions for parsing fields. For detailed information about regular expressions in NXLog, see the Reference Manual Regular Expressions section.
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
With this configuration, the Syslog message shown above is first parsed with
parse_syslog(). This results in a
$Message field created in the event
record. Then, a regular expression is used to further parse the
$Message
field and create additional fields if it matches.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<Extension _syslog>
Module xm_syslog
</Extension>
<Input udp>
Module im_udp
Host 0.0.0.0
Port 514
<Exec>
parse_syslog();
if $Message =~ /(?x)^Failed\ (\S+)\ for(?:\ invalid user)?\ (\S+)\ from
\ (\S+)\ port\ \d+\ ssh2$/
{
$AuthMethod = $1;
$AccountName = $2;
$SourceIPAddress = $3;
}
</Exec>
</Input>
Named capturing is supported also. Each captured group is automatically added to the event record as a field with the same name.
1
2
3
4
5
6
7
8
9
10
11
<Input in>
Module im_udp
Host 0.0.0.0
Port 514
<Exec>
parse_syslog();
$Message =~ /(?x)^Failed\ (?<AuthMethod>\S+)\ for(?:\ invalid\ user)?
\ (?<AccountName>\S+)\ from\ (?<SourceIPAddress>\S+)\ port
\ \d+\ ssh2$/;
</Exec>
</Input>
Field | Value |
---|---|
$AuthMethod |
password |
$AccountName |
linda |
$SourceIPAddress |
192.168.1.60 |
26.7.2. Pattern matching with Grok
The xm_grok module provides parsing for unstructured log messages with Grok patterns.
The examples below demonstrate how to parse Apache messages using Grok patterns.
The message below is a sample of an Apache access message.
192.168.3.20 - - [28/Jun/2019] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
The above Apache message can be parsed using the Grok pattern below.
ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\]
"%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}"
%{INT:http_status_code} %{INT:response_size}
The message below is a sample of an Apache error message.
[Fri Dec 16 01:46:23 2019] [error] [client 1.2.3.4] Directory index forbidden
by rule: /home/test/
The above Apache message can be parsed using the Grok pattern below.
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\]
\[client %{IP:client_address}\] %{GREEDYDATA:message}
Lists of Grok patterns are available in various repositories. As an example, see the logstash-plugin section on Github.
The following configuration reads messages from
the apache_entries.log
file using the im_file module and stores
the result in the $raw_event
field.
The match_grok() function reads patterns from
the patterns.txt
file and attempts a series of matches on the $raw_event
field. If none of the patterns match, an internal message is logged.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<Extension grok>
Module xm_grok
Pattern patterns.txt
</Extension>
<Input messages>
Module im_file
File "apache_entries.log"
<Exec>
if not ( match_grok($raw_event, "%{ACCESS_LOG}") or
match_grok($raw_event, "%{ERROR_LOG}"))
{
log_info('Event did not match any pattern');
}
</Exec>
</Input>
This example uses the patterns.txt
file, which contains all necessary Grok
patterns.
INT (?:[+-]?(?:[0-9]+))
YEAR (?>\d\d){1,2}
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
GREEDYDATA .*
IP (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)
TIMESTAMP_ACCESS %{INT}\/%{MONTH}\/%{YEAR}(:%{HOUR}:%{MINUTE}:%{SECOND} %{GREEDYDATA})?
TIMESTAMP_ERROR %{DAY} %{MONTH} %{INT} %{HOUR}:%{MINUTE}:%{SECOND} %{YEAR}
METHOD (GET|POST|PUT|DELETE|HEAD|TRACE|OPTIONS|CONNECT|PATCH){1}
HTTP_VERSION 1.(0|1)
ACCESS_LOG %{IP:ip_address} - - \[%{TIMESTAMP_ACCESS:timestamp}\] "%{METHOD:http_method} %{UNIXPATH:uri} HTTP/%{HTTP_VERSION:http_version}" %{INT:http_status_code} %{INT:response_size}
ERROR_LOG \[%{TIMESTAMP_ERROR:timestamp}\] \[%{LOGLEVEL:severity}\] \[client %{IP:client_address}\] %{GREEDYDATA:message}
26.7.3. Pattern matching with pm_pattern
Regular expressions are widely used in pattern matching. Unfortunately, using a large number of regular expression based patterns does not scale well, because these need to be evaluated linearly. The pm_pattern module implements a more efficient pattern matching than regular expressions used in Exec directives.
<38>Nov 22 10:30:12 myhost sshd[8459]: Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
With this configuration, the above Syslog message is first parsed with parse_syslog(). This results in a $Message field created in the event record. Then, the pm_pattern module is used with a pattern XML file to further parse the record.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<Extension _syslog>
Module xm_syslog
</Extension>
<Input in>
Module im_udp
Host 0.0.0.0
Port 514
Exec parse_syslog();
</Input>
<Processor pattern>
Module pm_pattern
PatternFile /var/lib/nxlog/patterndb.xml
</Processor>
<Output out>
Module om_null
</Output>
<Route r>
Path in => pattern => out
</Route>
The patterns for the pm_pattern module instance above
are declared in the following patterndb.xml
file.
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
<created>2010-01-01 01:02:03</created>
<version>42</version>
<!-- First and only pattern group in this file -->
<group>
<name>ssh</name>
<id>42</id>
<!-- Only try to match this group if $SourceName == "sshd" -->
<matchfield>
<name>SourceName</name>
<type>exact</type>
<value>sshd</value>
</matchfield>
<!-- First and only pattern in this pattern group -->
<pattern>
<id>1</id>
<name>ssh auth failure</name>
<!-- Do regular expression match on $Message field -->
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2</value>
<!-- Set 3 event record fields from captured strings -->
<capturedfield>
<name>AuthMethod</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>SourceIPAddress</name>
<type>string</type>
</capturedfield>
</matchfield>
<!-- Set additional fields if pattern matches -->
<set>
<field>
<name>TaxonomyAction</name>
<value>Authenticate</value>
<type>string</type>
</field>
<field>
<name>TaxonomyStatus</name>
<value>Failure</value>
<type>string</type>
</field>
</set>
</pattern>
</group>
</patterndb>
Field | Value |
---|---|
$AuthMethod |
password |
$AccountName |
linda |
$SourceIPAddress |
192.168.1.60 |
$TaxonomyAction |
Authenticate |
$TaxonomyStatus |
Failure |
NXLog Manager provides an interface for writing pattern files, and will also test sample events to aid in establishing the correct match patterns. The pattern functions can be accessed from the PATTERNS menu in the page header.
The following instructions explain the steps required for creating the above pattern database with NXLog Manager.
-
Open
. Enter a Name for the new pattern group, and optionally a Description, in the Properties section. The name is used to refer to the pattern group later. The name of the above pattern group isssh
. -
Add a match field by clicking Add Field in the Match section. Only messages that match will be further processed by this pattern group. In the above example, there is no reason to attempt any matches if the
$SourceName
field does not equalsshd
. The above pattern group uses Field name=SourceName
, Match=EXACT
, and Value=sshd
. -
Save the new pattern group.
-
Open
to create a new field to be used when creating new patterns. For the above example, the$AuthMethod
field must be added because it is not in the default set provided by NXLog Manager. Set Name=AuthMethod
and Field Type=STRING
, then click Save. -
Open
. In the Pattern Info section, enter a Pattern Name and optionally a Pattern Description. Select the correct Pattern Group from the list. In the above example, thessh
pattern group is used. -
In the Match section, set match values for the fields to be matched. If a regular expression match with captured subgroups is detected, the interface will provide a Captured fields list where target fields can be selected. The above example uses Field name=
Message
, Match=REGEXP
, and Value=^Failed (\S+) for(?: invalid user)? (\S+) from (\S+) port \d+ ssh2$
. The three captured fields areAuthMethod
,AccountName
, andSourceIPAddress
. -
The Set section allows fields to be set if the match is successful. Click Add Field for each field. The above example sets
$TaxonomyStatus
toFailure
and$TaxonomyAction
toAuthenticate
. -
The Action section accepts NXLog language statements like would be specified in an Exec directive. Click Add action, type in the statement, and click Verify to make sure the statement is valid. The above example does not include any NXLog language statements.
-
The final tabbed section allows test messages to be entered to verify that the match works as expected. Click the + to add a test case. To test the above example, add a Value for the
Message
field:Failed password for invalid user linda from 192.168.1.60 port 38176 ssh2
. Click Update Test Cases in the Match section to automatically fill the captured fields. Verify that the fields are set as expected. Additional test cases can be added to test other events. -
Save the new pattern. Then click Export to download the pattern.xml file or use the pattern to configure a managed agent.
See the NXLog Manager User Guide for more information.
26.7.4. Using the extracted fields
The previous sections explore ways that the log message can be parsed and new fields added to the event record. Once the required data has been extracted and corresponding fields created, there are various ways to use this new data.
-
A field or set of fields can be matched by string or regular expression to trigger alerts, perform filtering, or further classify the event.
-
Fields in the event record can be renamed, modified, or deleted.
-
Event correlation can be used to execute statements or suppress messages based on matching events inside a specified window.
-
Some output formats can be used to preserve the full set of fields in the event record (such as JSON and the NXLog Binary format).