News and blog
NXLog main page
  • Products
    NXLog Platform
    Log collection
    Log management and analytics
    Log storage
    NXLog Community Edition
    Integrations
    Professional Services
  • Solutions
    Use cases
    Specific OS support
    SCADA/ICS
    Windows event log
    DNS logging
    MacOS logging
    Solutions by industry
    Financial Services
    Government & Education
    Entertainment & Gambling
    Telecommunications
    Medical & Healthcare
    Military & Defense
    Law Firms & Legal Counsel
    Industrial & Manufacturing
  • Plans
  • Partners
    Find a Reseller
    Partner Program
  • Resources
    Documentation
    Blog
    White papers
    Videos
    Webinars
    Case Studies
    Community Program
    Community Forum
  • About
    Company
    Careers
  • Support
    Support portals
    Contact us

NXLog Platform
Log collection
Log management and analytics
Log storage
NXLog Community Edition
Integrations
Professional Services

Use Cases
Specific OS support
SCADA/ICS
Windows event log
DNS logging
MacOS logging
Solutions by industry
Financial Services
Government & Education
Entertainment & Gambling
Telecommunications
Medical & Healthcare
Military & Defense
Law Firms & Legal Counsel
Industrial & Manufacturing


Find a Reseller
Partner Program

Documentation
Blog
White papers
Videos
Webinars
Case Studies
Community Program
Community Forum

Company
Careers

Support portals
Contact us
Let's Talk Start free
NXLog search
  • Loading...
Let's Talk Start free
May 30, 2022 siemlinuxos

Collecting kernel events with NXLog for analysis in the Elastic stack

By Igor Bezzubchenko

Share
ALL SIEM STRATEGY SECURITY ANNOUNCEMENT DEPLOYMENT COMPLIANCE COMPARISON RSS

It is known that measuring performance is one of the most challenging tasks in system administration. It requires proper configuration and a good understanding of the results. Fortunately, Linux systems offer a wide variety of tools for obtaining performance metrics. In this blog post, we will focus on the instrumentation capabilities of the Linux kernel and some interesting methods of analyzing the results.

The importance of the kernel lies in the fact that usage information related to CPU, memory, disk space, or network interfaces is always passing through it, and it cannot be bypassed. Of course, there are exceptions where the kernel is not involved in an operation or is involved only at a minimum, like creating the user-space TCP/IP stack with the Data Plane Development Kit. Still, kernel events provide the most accurate and precise picture of what is happening in the system for most tasks.

There are various tools to measure a Linux system’s performance. Tools like top, iostat, mpstat, and iftop can provide answers to questions like:

  • Which process is causing excessive disk writes?

  • How many CPU cores is a specific application using?

  • Is system utilization high, moderate, or low?

The average system administrator does not usually need to go beyond these tools. But what if you need to answer more specific questions, for example:

  • How many times did a process invoke a certain syscall, and what were the results?

  • How successful was a process at opening an external device, like a USB drive?

  • How much time must a device wait before its interrupt is serviced?

ftrace is a kernel event tracing tool that can easily answer these questions. It has been available since version 2.6 and is essentially a keyhole into the kernel, allowing a user to subscribe to certain types of events or functions. This excerpt from the tool’s readme file written by its author, Steven Rostedt, describes it perfectly:

Ftrace is an internal tracer designed to help out developers and system designers, to find what is going on inside the kernel. It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space.
— Steven Rostedt

At the heart of ftrace lies a circular buffer where events are aggregated by tracers. By tracer, we mean kernel code capable of collecting certain types of events. ftrace is built on top of debugfs, so its environment is exposed in /sys/kernel/debug/tracing. For example, to get the list of available tracers, one can consult this file:

# cat /sys/kernel/debug/tracing/available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop

This represents the tracing facilities available by default on most machines. But the Linux kernel supports numerous other interesting tracers that can be observed by looking in the kernel/trace/trace_*.c files of the Linux source tree. They can be activated by recompiling the kernel with the necessary configuration options.

[~/my-kernel-source/kernel/trace]$ ls  trace_*.c
trace_benchmark.c   trace_events_filter.c    trace_hwlat.c            trace_output.c            trace_selftest_dynamic.c
trace_boot.c        trace_events_hist.c      trace_irqsoff.c          trace_preemptirq.c        trace_seq.c
trace_branch.c      trace_events_inject.c    trace_kdb.c              trace_printk.c            trace_stack.c
trace_clock.c       trace_events_synth.c     trace_kprobe.c           trace_probe.c             trace_stat.c
trace_dynevent.c    trace_events_trigger.c   trace_kprobe_selftest.c  trace_recursion_record.c  trace_syscalls.c
trace_eprobe.c      trace_export.c           trace_mmiotrace.c        trace_sched_switch.c      trace_uprobe.c
trace_event_perf.c  trace_functions.c        trace_nop.c              trace_sched_wakeup.c
trace_events.c      trace_functions_graph.c  trace_osnoise.c          trace_selftest.c

For example, by reading the comments in the trace_clock.c source code, we can figure out what it is about:

  • Implements 3 trace clock variants with differing scalability/precision

  • Tradeoffs:

    • Local: CPU-local trace clock

    • Medium: scalable global clock with some jitter

    • Global: globally monotonic, serialized clock

I’m providing these examples to illustrate just how low the entrance barrier is for getting started with Linux kernel tracing. Out of the available standard tracers, we’ll concentrate on blk, which allows tracing block I/O operations. So let’s take a deeper look!

Block I/O tracer

It’s not a secret that disk throughput is sometimes a limiting factor in high-performance computing. Modern SSD disks show a read/write rate of hundreds of megabytes per second, while DDR memory performance starts in the thousands. Locating a bottleneck in the disk I/O subsystem has always been a challenge. For years Linux systems offered well-known tools like iostat, but their output is limited, providing only average numbers gathered over a defined period. It does not tell us how many disk operations occurred and what the size of those operations was. In 2007 a new tool, blktrace, appeared. It sits on top of the ftrace subsystem mentioned above and uses the blk tracer. It is a valuable tool for tackling the following tasks:

  • Analyzing the I/O performance of disks

  • Estimating the exact cost of adding new resources, like software RAID

  • Comparing different disk configurations

  • Finding the optimal disk configuration for a specific software task

  • Analyzing the performance of different file systems that reside on top of block devices

Running blktrace for the first time

Let’s jump right into the example. Try running the following command as a superuser:

# blktrace -d /dev/sda -w 3 -o - | blkparse -i -

Substitute sda with an adequate disk available on your system. You’ll immediately see an avalanche of console output, followed by a summary report after about 3 seconds. Each line of output corresponds to a disk-related operation.

A brief explanation of the command line options used above:

  • -d means a device to monitor.

  • -w 3 means wait for 3 seconds and exit.

  • -o- means write the gathered data to the standard output.

Remember that blktrace does not generate any readable data, so its output is usually redirected to a file or — if you need real-time data collection — to a parsing program that can translate binary data into readable text. This is what -o- and -i- expressions are used for, and blkparse is the tool we need, which is usually bundled with the same OS package as blktrace.

Let’s see how we can interpret each line. This knowledge will help us further when we build our real-time analytics.

  8,0    2      145     0.961955712 3140264  I  WS 3568367616 + 1160 [elasticsearch[i]
  • 8,0: The device’s major and minor numbers. You can see this by running ls -l /dev/sda.

  • 2: CPU core number

  • 145: Sequence number relative to the start of the execution

  • 0.961955712: Timestamp of the operation, relative to the start of the execution, in seconds and microseconds

  • 3140264: Process ID of the initiating application

  • I: Type of action; here, "I" is "insertion into the request queue."

  • WS: Type of operations; here, we have Synchronous Write.

  • 3568367616: Block number to start writing from

  • 1160: Number of blocks to write

  • elasticsearch: The process name

Collecting events on a large scale

The vast amount of data collected needs to be fed into an analytics platform to make sense. We’ll use Elasticsearch in conjunction with Kibana since these tools are designed to work together. So how do we send the data to Elasticsearch? Fortunately, this can easily be achieved with NXLog Enterprise Edition, which provides an interface to Elasticsearch and numerous other destinations. It will save us from digging into the internals of the Elastic API and writing error-prone, handcrafted scripts. Additionally, NXLog can easily be configured to collect and parse all block I/O events coming from blkparse, but first, some additional steps need to be taken.

Prepare the output format

You can use a regular expression to parse the input data with NXLog Enterprise Edition. Alternatively, you can modify the default blkparse output to a different format, such as “key-value” pairs. This also gives you the ability to choose what data to log. For example, the following command assumes you’re not interested in the major-minor device identifiers, and that you would like to retrieve the number of bytes per operation instead of the default number of blocks:

# blktrace -d /dev/sda -o- | blkparse -f "EventTime=%z,CpuIndex=%c,NanoTime=%T.%9t,PID=%p,Action=%a,RWBS=%d,Bytes=%N,Cmd=%C\n" -i -

The output will look like this:

EventTime=20:09:02.077001,CpuIndex=2,NanoTime=0.000076323,PID=2868167,Action=D,RWBS=WS,Bytes=20480,Cmd=firefox

Doesn’t that look much nicer than the default format?

Configure NXLog

Although configuring NXLog to process these logs is a simple task, there are some caveats that you should be aware of.

First, blkparse log lines only include the time of the event, not the date. This is a problem if you need to create reports spanning multiple days. The analytics platform needs both the date and time to make sense of the data. A simple workaround for this issue is to use NXLog’s built-in EventReveivedTime field. In most cases, the date that the event is processed is the same as the event date, so this is a safe assumption.

Second, running a kernel trace requires superuser permissions. By default, on Linux, the NXLog process runs with the dedicated nxlog user, which has fewer privileges as a best security practice. A possible workaround is to wrap our chained command into a script and declare blktrace as sudo-capable without asking for a password. To achieve this, add the following line in the /etc/sudoers file:

nxloguser   ALL=(ALL) NOPASSWD: /usr/bin/blktrace

While it is not the best approach security-wise, it is sufficient for this demonstration.

Finally, use the following NXLog configuration to collect kernel events:

nxlog.conf
<Extension kvp>
    Module          xm_kvp
    KVPDelimiter    ,
    KVDelimiter     =
    EscapeChar      \\
</Extension>

<Extension json>
    Module          xm_json
</Extension>

<Input from_blk>
    Module          im_exec
    Command         /path/to/blk_trace_parse.sh
    <Exec>
        if $raw_event !~ /^EventTime/
        {
            drop();
        }
        else
        {
            kvp->parse_kvp();
            if string($EventReceivedTime) =~ /^(\d{4})-(\d{2})-(\d{2})/
            {
                $EventTime = parsedate($1 + "-" + $2 + "-" + $3 + " " + $EventTime);
                to_json();
            }
            else drop();
        }
  </Exec>
</Input>

<Output to_elastic>
    Module          om_elasticsearch
    URL             http://localhost:9200/_bulk
    Index           "blk-test"
</Output>

<Route r1>
    Path            from_blk => to_elastic
</Route>

The following is an sample script that can be used with the im_exec input module:

#!/bin/bash
sudo blktrace -d /dev/nvme0n1 -o- | blkparse -f \" ...
Note
The full command line parameters are omitted; they were discussed earlier.

Setting up Elasticsearch and Kibana

Now it’s time to see our data in action with Elasticsearch and Kibana. We will not go into detailed installation instructions for these tools, but the following steps apply to most Linux distributions.

  1. Download the latest Elasticsearch and Kibana packages from the official website, or install them using your operating system’s package manager.

  2. Extract the archives (if you downloaded the .TAR.GZ package) to the desired location. For example, the files can be placed in a non-privileged user directory, e.g., the user’s home directory.

  3. Point Kibana to Elasticsearch by uncommenting or adding the following line in the kibana.yml configuration file:

    elasticsearch.hosts: [http://localhost:9200]
  4. Start both platforms using your operating system’s service manager or from the console:

    $ cd /path/to/elastic/folder/bin && ./elasticsearch

    And in another console window:

    $ cd /path/to/kibana/folder/bin && ./kibana

    Superuser privileges are not required. In fact, it is actually discouraged to run Elasticsearch and Kibana as a superuser. Once you have started both platforms, you should see an avalanche of console output pouring in. Give it some time, and when the flow decreases, it is an indication that both tools are up and running.

Kernel events analytics in action

Open your Kibana instance for the first time by loading http://localhost:5601 from a local browser. The first time you access Kibana, you will be asked to add data by integrating with an external data provider or uploading a file.

First steps with Kibana

Close this dialog and instead navigate to Management > Stack management > Index patterns. To avoid creating our index using the Elastic API, we can start NXLog at this stage to create our index automatically.

Start the NXLog service using your operating system’s service manager or use the following command:

$ /opt/nxlog/bin/nxlog -f -c /opt/nxlog/etc/nxlog.conf

This command starts NXLog in foreground mode and provides an explicit path to the configuration file we created above. As a result, you should instantly see the successful connection to your Elasticsearch instance, for example:

2022-04-05 13:55:45 INFO [om_elasticsearch|to_elastic] successfully connected to localhost([::1]):9200

Refresh the Index Management page, and you should see the following screen:

Create Elasticsearch index

Click Create Index pattern.

Enter the name of the index corresponding to the NXLog configuration. In this case, we use blk-test from our example above. Then, in the Timestamp field dropdown, select the EventTime field. Note that we do not have to instruct Elasticsearch on how to parse the timestamp field because we construct the expected date format by Elasticsearch in the NXLog configuration file.

Elasticsearch index pattern

Finally, you should see all the fields as follows:

Kernel event fields in Elasticsearch

Now go to Discover and verify that the data is present. Select Last 1 hour in the date range field, and you should see a nice histogram similar to the following:

Kernel events in Kibana

You can also enable the auto-refresh feature to view updated data in real-time mode. Under Refresh every enter 5 seconds and click Start.

Kibana auto-refresh

The graph will auto-scroll to the left when new data is displayed.

Finally, let’s create a practical real-time scenario for analyzing writing pressure on the disk.

Go to Visualize Library and filter out anything that is not related to writing operations by entering all types of RWBS fields that contain W (write):

Kibana event filter

Next, create an aggregation to visualize the total number of bytes grouped by application name:

Aggregate event data in Kibana

You should see a graph similar to the following:

Kibana data visualization

There is one caveat here; you probably expect to see recognizable application names, but you will notice that the list contains some unusual application names. If a process spawns a thread, the thread name will be listed instead of the program name itself. In the screenshot above, ThreadPoolForeg is a thread name belonging to Webkit-based web browsers like Opera and Chrome. Additionally, names like kworker/u8:2 or swapper/1 are kernel threads. In short, you may need to research which application each thread name belongs to.

Conclusion

This blog post demonstrates how to combine precise kernel tracing with data analytics. With this approach, you can have confidence in the data you are measuring since there are no intermediate programs in the pipeline that could impose algorithms and skew the results. Instead, raw numbers are inserted directly into your analytics platform, avoiding possible intermediate bias.

While this example works for a small dataset, things may be different on a live server where the data flow volume can be huge. A continuous large data stream may lead to substantial resource consumption by the Elasticsearch process, resulting in a noticeable contribution to disk usage statistics by the log collection toolchain (NXLog → Elasticsearch → Kibana). One solution would be to distribute the load so that the system-under-test, the log collection agent, and the analytics platform run on separate machines. In any case, NXLog Enterprise Edition provides a myriad of options you can use to create an optimal workflow based on your requirements and resource constraints.

NXLog Platform is an on-premises solution for centralized log management with
versatile processing forming the backbone of security monitoring.

With our industry-leading expertise in log collection and agent management, we comprehensively
address your security log-related tasks, including collection, parsing, processing, enrichment, storage, management, and analytics.

Start free Contact us
Disclaimer

While we endeavor to keep the information in this topic up to date and correct, NXLog makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability, or availability of the content represented here. We update our screenshots and instructions on a best-effort basis.

The accurateness of the content was tested and proved to be working in our lab environment at the time of the last revision with the following software versions:

Ubuntu 20.04.4 LTS
RHEL 7
NXLog Enterprise Edition 5.5.7535

Last revision: 5 April 2022

  • siem
  • elasticsearch
  • kibana
  • kernel log
  • linux
  • integration
Share

Facebook Twitter LinkedIn Reddit Mail
Related Posts

Making the most of Windows Event Forwarding for centralized log collection
6 minutes | December 17, 2018
DNS Log Collection on Windows
8 minutes | May 28, 2020
Sending ETW Logs to Splunk with NXLog
5 minutes | March 3, 2020

Stay connected:

Sign up

Keep up to date with our monthly digest of articles.

By clicking singing up, I agree to the use of my personal data in accordance with NXLog Privacy Policy.

Featured posts

Announcing NXLog Platform 1.6
April 22, 2025
Announcing NXLog Platform 1.5
February 27, 2025
Announcing NXLog Platform 1.4
December 20, 2024
NXLog redefines log management for the digital age
December 19, 2024
2024 and NXLog - a review
December 19, 2024
Announcing NXLog Platform 1.3
October 25, 2024
NXLog redefines the market with the launch of NXLog Platform: a new centralized log management solution
September 24, 2024
Welcome to the future of log management with NXLog Platform
August 28, 2024
Announcing NXLog Enterprise Edition 5.11
June 20, 2024
Raijin announces release of version 2.1
May 31, 2024
Ingesting log data from Debian UFW to Loki and Grafana
May 21, 2024
Announcing NXLog Enterprise Edition 6.3
May 13, 2024
Raijin announces release of version 2.0
March 14, 2024
NXLog Enterprise Edition on Submarines
March 11, 2024
The evolution of event logging: from clay tablets to Taylor Swift
February 6, 2024
Migrate to NXLog Enterprise Edition 6 for our best ever log collection experience
February 2, 2024
Raijin announces release of version 1.5
January 26, 2024
2023 and NXLog - a review
December 22, 2023
Announcing NXLog Enterprise Edition 5.10
December 21, 2023
Raijin announces release of version 1.4
December 12, 2023
Announcing NXLog Enterprise Edition 6.2
December 4, 2023
Announcing NXLog Manager 5.7
November 3, 2023
Announcing NXLog Enterprise Edition 6.1
October 20, 2023
Raijin announces release of version 1.3
October 6, 2023
Upgrading from NXLog Enterprise Edition 5 to NXLog Enterprise Edition 6
September 11, 2023
Announcing NXLog Enterprise Edition 6.0
September 11, 2023
The cybersecurity challenges of modern aviation systems
September 8, 2023
Raijin announces release of version 1.2
August 11, 2023
The Sarbanes-Oxley (SOX) Act and security observability
August 9, 2023
Log Management and PCI DSS 4.0 compliance
August 2, 2023
Detect threats using NXLog and Sigma
July 27, 2023
HIPAA compliance logging requirements
July 19, 2023
Announcing NXLog Enterprise Edition 5.9
June 20, 2023
Industrial cybersecurity - The facts
June 8, 2023
Raijin announces release of version 1.1
May 30, 2023
CISO starter pack - Security Policy
May 2, 2023
Announcing NXLog Enterprise Edition 5.8
April 24, 2023
CISO starter pack - Log collection fundamentals
April 3, 2023
Raijin announces release of version 1.0
March 9, 2023
Avoid vendor lock-in and declare SIEM independence
February 13, 2023
Announcing NXLog Enterprise Edition 5.7
January 20, 2023
NXLog - 2022 in review
December 22, 2022
Need to replace syslog-ng? Changing to NXLog is easier than you think
November 23, 2022
The EU's response to cyberwarfare
November 22, 2022
Looking beyond Cybersecurity Awareness Month
November 8, 2022
GDPR compliance and log data
September 23, 2022
NXLog in an industrial control security context
August 10, 2022
Raijin vs Elasticsearch
August 9, 2022
NXLog provides native support for Google Chronicle
May 11, 2022
Aggregating macOS logs for SIEM systems
February 17, 2022
How a centralized log collection tool can help your SIEM solutions
April 1, 2020

Categories

  • SIEM
  • STRATEGY
  • SECURITY
  • ANNOUNCEMENT
  • DEPLOYMENT
  • COMPLIANCE
  • COMPARISON
logo

Subscribe to our newsletter to get the latest updates, news, and products releases. 

© Copyright 2024 NXLog FZE.

Privacy Policy. General Terms of Use

Follow us

  • Product
  • NXLog Platform 
  • Log collection
  • Log management and analysis
  • Log storage
  • Integration
  • Professional Services
  • Plans
  • Resources
  • Documentation
  • Blog
  • White papers
  • Videos
  • Webinars
  • Case studies
  • Community Program
  • Community forum
  • Support
  • Getting started guide
  • Support portals
  • About NXLog
  • About us
  • Careers
  • Find a reseller
  • Partner program
  • Contact us