NXLog Docs

Reducing bandwidth and data size

There are several ways that NXLog can be configured to reduce the size of log data. This can help lower bandwidth requirements during transport, storage requirements for log data storage, and licensing costs for commercial SIEM systems that charge based on data volume.

The three main strategies for achieving this goal are covered in the following sections:

  • Filtering events by removing unnecessary or duplicate events at the source so that less data needs to be transported and stored—​reducing the data size during all subsequent stages of processing.

  • Trimming events by removing extra content or fields from event records which can reduce the total volume of log data.

  • Compressing during transport can drastically reduce bandwidth requirements for events being forwarded.

To achieve the best results, it is important to understand how fields work in NXLog and which fields are being transferred or stored. For example, removing or modifying fields without modifying $raw_event will not reduce data requirements at all for an output module instance that uses only $raw_event. See Event records and fields for details, as well as the explanation in Compressing during transport below.

Filtering events

Depending on the logging requirements and the log source, it may be possible to simply discard certain events. NXLog can be configured to filter events based on nearly any set of criteria. See also Filtering logs.

Example 1. Dropping unnecessary events

In this example, an NXLog agent is configured to collect Syslog messages from devices on the local network. Events are parsed with the xm_syslog parse_syslog() procedure, which sets the SeverityValue field. Any event with a normalized severity lower than 3 (warning) is discarded.

nxlog.conf
<Extension _syslog>
    Module  xm_syslog
</Extension>

<Input syslog>
    Module  im_udp
    Host    0.0.0.0
    Port    514
    Exec    parse_syslog(); if $SeverityValue < 3 drop();
</Input>

Similarly, the pm_norepeat module can be used to detect, count, and discard duplicate events. In their place, pm_norepeat generates a single event with a last message repeated n times message.

Example 2. Dropping duplicate events

With this configuration, NXLog collects Syslog messages from hosts on the local network with im_udp and parses them with the xm_syslog parse_syslog() procedure. Events are then routed through a pm_norepeat module instance, where the $Hostname, $Message, and $SourceName fields are checked to detect duplicate messages. Last, events are sent to a remote host with om_batchcompress.

nxlog.conf
<Extension _syslog>
    Module      xm_syslog
</Extension>

<Input syslog_udp>
    Module      im_udp
    Host        0.0.0.0
    Port        514
    Exec        parse_syslog();
</Input>

<Processor norepeat>
    Module      pm_norepeat
    CheckFields Hostname, Message, SourceName
</Processor>

<Output out>
    Module      om_batchcompress
    Host        10.2.0.2
    Port        2514
</Output>

<Route r>
    Path        syslog_udp => norepeat => out
</Route>

Trimming events

NXLog can be configured to parse events into various fields in the event record. In this case, a whitelist can be used to retain a set of important fields. See Rewriting and modifying logs for more information about modifying events.

Example 3. Discarding extra fields via whitelist

This configuration reads from the Windows Event Log with im_msvistalog and uses an xm_rewrite module instance to discard any fields in the event record that are not included in the whitelist. The xm_rewrite instance below could be used with multiple sources; for example, the whitelist would also be suitable for the xm_syslog fields.

The xm_rewrite module does not remove the $raw_event field.
nxlog.conf
<Extension whitelist>
    Module  xm_rewrite
    Keep    AccountName, Channel, EventID, EventReceivedTime, EventTime, Hostname, \
            Severity, SeverityValue, SourceName
</Extension>

<Input eventlog>
    Module  im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id='0'>
                <Select Path='Security'>*[System/Level&lt;=4]</Select>
            </Query>
        </QueryList>
    </QueryXML>
    Exec    whitelist->process();
</Input>

In some cases, event messages contain a lot of extra data that is duplicated across multiple events of the same time. One example of this is the "descriptive event data" which has been introduced by Microsoft for the Windows Event Log. By removing this verbose text from common events, event sizes can be reduced significantly while still preserving all the forensic details of the event.

Example 4. Removing descriptive data from event messages

The following configuration collects events from the Application, Security, and System channels. Rules are included for truncating the messages of Security events with IDs 4688 and 4769.

In this example, the $Message field is truncated. However, the $raw_event field is not. For most input modules, $raw_event will include the contents of $Message and other fields (see the im_msvistalog $raw_event field). To update the $raw_event field, include a statement for this (see the comment in the configuration example). See also Compressing during transport below for more details.
Input sample (event ID 4769)
A Kerberos service ticket was requested.

Account Information:
        Account Name:           WINAD$@TEST.COM
        Account Domain:         TEST.COM
        Logon GUID:             {55a7f67c-a32c-150a-29f1-7e173ff130a7}

Service Information:
        Service Name:           WINAD$
        Service ID:             TEST\WINAD$

Network Information:
        Client Address:         ::1
        Client Port:            0

Additional Information:
        Ticket Options:         0x40810000
        Ticket Encryption Type: 0x12
        Failure Code:           0x0
        Transited Services:     -

This event is generated every time access is requested to a resource such as a computer or a Windows service.  The service name indicates the resource to which access was requested.

This event can be correlated with Windows logon events by comparing the Logon GUID fields in each event.  The logon event occurs on the machine that was accessed, which is often a different machine than the domain controller which issued the service ticket.

Ticket options, encryption types, and failure codes are defined in RFC 4120.
nxlog.conf
<Input eventlog>
    Module  im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id="0">
                <Select Path="Application">
                    *[System[(Level&lt;=4)]]</Select>
                <Select Path="Security">
                    *[System[(Level&lt;=4)]]</Select>
                <Select Path="System">
                    *[System[(Level&lt;=4)]]</Select>
            </Query>
        </QueryList>
    </QueryXML>
    <Exec>
        if ($Channel == 'Security') and ($EventID == 4688)
            $Message =~ s/\s*Token Elevation Type indicates the type of .*$//s;
        else if $(Channel == 'Security') and ($EventID == 4769)
            $Message =~ s/\s*This event is generated every time access is .*$//s;
        # Additional rules can be added here
        # ...
        # Optionally, update the $raw_event field
        #$raw_event = $EventTime + ' ' + $Message;
    </Exec>
</Input>
Output sample
A Kerberos service ticket was requested.

Account Information:
        Account Name:           WINAD$@TEST.COM
        Account Domain:         TEST.COM
        Logon GUID:             {55a7f67c-a32c-150a-29f1-7e173ff130a7}

Service Information:
        Service Name:           WINAD$
        Service ID:             TEST\WINAD$

Network Information:
        Client Address:         ::1
        Client Port:            0

Additional Information:
        Ticket Options:         0x40810000
        Ticket Encryption Type: 0x12
        Failure Code:           0x0
        Transited Services:     -

There are cases when large events may cause a problem during transport or for processing by the receiving end. Such a case may be packet fragmentation when using UDP. To prevent this issue, the event may be truncated to make sure that it does not exceed a specific size.

Example 5. Truncating events

The following configuration reads from the Windows Event Log with im_msvistalog and truncates the event to 1000 bytes by using the substr() function. This function accepts an input string and returns a sub-string with the starting and ending positions as byte offsets from the beginning of the string.

This method will cause data after the specified position to be discarded. It should only be used in rare cases when the packet size must not be larger than a set limit.
nxlog.conf
<Input eventlog>
    Module  im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id='0'>
                <Select Path='Security'>*[System/Level=4]</Select>
            </Query>
        </QueryList>
    </QueryXML>
    Exec    $raw_event = substr($raw_event, 0, 1000);
</Input>

Compressing during transport

There are several ways that event data can be transported between NXLog agents, including the *m_tcp and *m_ssl modules. However, those modules do not provide data compression. The im_batchcompress and om_batchcompress modules, available in NXLog Enterprise Edition, can be used to transfer events in compressed (and optionally, encrypted) batches.

The following chart compares the data requirements for the *m_tcp, *m_ssl (with TLSv1.2), and *m_batchcompress module pairs. It is based on a sample of BSD Syslog records parsed with parse_syslog(). The values shown reflect the total bi-directional bytes transferred at the packet level. Of course, ratios will vary from this in practice based on network conditions and the compressibility of the event data.

Note that the om_tcp and om_ssl modules (among others) transfer only the $raw_event field by default, but can be configured to transfer all fields with OutputType Binary. The om_batchcompress module transfers all fields in the event record, but it is possible to send only the $raw_event field by first removing the other fields (see Generating $raw_event and removing other fields below).

Data Requirements for Various Transfer Methods

Simply configuring the *m_batchcompress modules for the transfer of event data between NXLog agents can significantly reduce the bandwidth requirements for that part of the log path.

The table below displays the comparison of sending the same data set using different methods and modules:

Table 1. Data transfer comparison
Compression method Modules used Event size Diff vs baseline Sender CPU usage Receiver CPU usage EPS sender EPS receiver

None

om_tcp, im_tcp

112

0.00%

141

215.07

83091.8

84169.9

None

om_ssl, im_ssl

301.7

+169.38%

141.34

191.9

33161.4

47482.9

SSLCompression

om_ssl, im_ssl

293.2

+161.79%

138.98

190.69

34497.7

47128.5

Batch compression

om_batchcompress, im_batchcompress

18.4

-83.57%

119.69

181.1

36252.1

77491.8

Compression ratios show that enabling SSLCompression yields only a minimal improvement in message size.

Batch compression fares much better, because it compresses data in batches leading to better compression ratios.

Example 6. Batched log transfer

With the following configuration, an NXLog agent uses om_batchcompress to send events in compressed batches to a remote NXLog agent.

The *m_batchcompress modules also support SSL/TLS encryption; see the im_batchcompress and om_batchcompress configuration details.
nxlog.conf (sending agent)
<Input in>
    Module  im_file
    File    'input.log'
</Input>

<Output out>
    Module  om_batchcompress
    Host    10.2.0.2
    Port    2514
</Output>

The remote NXLog agent receives and decompresses the received batches with im_batchcompress. All fields in an event are available to the receiving agent.

nxlog.conf (receiving agent)
<Input in>
    Module      im_batchcompress
    ListenAddr  10.2.0.2
    Port        2514
</Input>

<Output out>
    Module  om_file
    File    'output.log'
</Output>

To further reduce the size of the batches transferred by the *m_batchcompress modules, and if only the $raw_event field will be needed later in the log path, the extra fields can be removed from the event record prior to transfer. This can be done with an xm_rewrite instance for multiple fields or with the delete() procedure (see Renaming and deleting fields in a log message).

Example 7. Generating $raw_event and removing other fields

In this configuration, events are collected from the Windows Event Log with im_msvistalog, which sets the $raw_event and many other fields. To reduce the size of the events, only the $raw_event field is retained; all the other fields in the event record are removed by the xm_rewrite module instance (called by clean->process()).

Rather than using the default im_msvistalog $raw_event field, it would also be possible to customize it with something like $raw_event = $EventTime + ' ' + $Message or to_json().
nxlog.conf
<Extension clean>
    Module  xm_rewrite
    Keep    raw_event
</Extension>

<Input eventlog>
    Module  im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id='0'>
                <Select Path='Security'>*[System/Level&lt;=4]</Select>
            </Query>
        </QueryList>
    </QueryXML>
</Input>

<Output out>
    Module  om_batchcompress
    Host    10.2.0.2
    Exec    clean->process();
</Output>

Alternatively, if the various fields in the event record will be handled later in the log path, the $raw_event field can be set to an empty string (but see the warning below).

Example 8. Emptying $raw_event and sending other fields

This configuration collects events from the Windows Event Log with im_msvistalog, which writes multiple fields to the event record. In this case, the $raw_event field contains the same data as other fields. Because the om_batchcompress module instance will send all the fields in the event record, the $raw_event field can be emptied.

Many output modules operate on the $raw_event field only. It should not be set to an empty string unless the output module sends all the event fields (om_batchcompress or a module using the Binary OutputType) and so on for all subsequent agents and modules. Otherwise, a module instance will encounter an empty $raw_event. For this reason, the following example is in general not recommended.
nxlog.conf
<Input eventlog>
    Module  im_msvistalog
    <QueryXML>
        <QueryList>
            <Query Id='1'>
                <Select Path='Security'>*[System/Level&lt;=4]</Select>
            </Query>
        </QueryList>
    </QueryXML>
</Input>

<Output out>
    Module  om_batchcompress
    Host    10.2.0.2
    Exec    $raw_event = '';
</Output>