News and blog
NXLog main page
  • Products
    NXLog Platform
    Log collection
    Log management and analytics
    Log storage
    NXLog Community Edition
    Integrations
    Professional Services
  • Solutions
    Use cases
    Specific OS support
    SCADA/ICS
    Windows event log
    DNS logging
    MacOS logging
    Open Telemetry
    Solutions by industry
    Financial Services
    Government & Education
    Entertainment & Gambling
    Telecommunications
    Medical & Healthcare
    Military & Defense
    Law Firms & Legal Counsel
    Industrial & Manufacturing
  • Pricing
    Licensing
    Plans
  • Partners
    Find a Reseller
    Partner Program
    Partner Portal
  • Resources
    Documentation
    Blog
    White papers
    Videos
    Webinars
    Case Studies
    Community Program
    Community Forum
  • About
    Company
    Careers
  • Support
    Support portals
    Contact us

NXLog Platform
Log collection
Log management and analytics
Log storage
NXLog Community Edition
Integrations
Professional Services

Use Cases
Specific OS support
SCADA/ICS
Windows event log
DNS logging
MacOS logging
Open Telemetry
Solutions by industry
Financial Services
Government & Education
Entertainment & Gambling
Telecommunications
Medical & Healthcare
Military & Defense
Law Firms & Legal Counsel
Industrial & Manufacturing

Licensing
Plans

Find a Reseller
Partner Program
Partner Portal

Documentation
Blog
White papers
Videos
Webinars
Case Studies
Community Program
Community Forum

Company
Careers

Support portals
Contact us
Let's Talk
  • Start free
  • Interactive demo
Let's Talk
  • Start free
  • Interactive demo
NXLog search
  • Loading...
Let's Talk
  • Start free
  • Interactive demo
June 12, 2026 siem

Multiline log parsing with regex: Keeping multiline events intact for your SIEM

By Robert Audzeyeu

Share
ALL ANNOUNCEMENT COMPARISON COMPLIANCE DEPLOYMENT SECURITY SIEM STRATEGY RSS

Most telemetry pipelines treat every newline as the end of an event. That assumption holds for a tidy syslog stream but breaks the moment a Java stack trace, a Python traceback, or a pretty-printed JSON payload lands in the file. One event becomes forty lines, and your SIEM ingests forty fragments instead of one record.

For a SecOps team, the cost is operational. Detection rules match on fragments or miss the event entirely, correlation loses the context that made the event worth alerting on, and the event count balloons against a volume-based license. The fix is to define where each event starts and ends with a regular expression, and to do it at the collection layer before it reaches the SIEM.

Why "one line, one event" breaks

Plenty of the sources a SecOps team cares about emit events across several lines:

  • JVM exceptions and Java stack traces, where the message line is followed by dozens of at …​ frames.

  • Python tracebacks, which wrap the error in Traceback (most recent call last): and an indented call chain.

  • Application logs that print a request, a response body, and a result across separate lines.

  • Pretty-printed XML or JSON, indented for humans and spread over many lines.

When a line-oriented collector splits these, three things go wrong. Your detection logic sees at com.example.Service.run() as a standalone event and has nothing to match against. Correlation rules that depend on the error message and its stack don’t see them together. And a single 40-line trace counts as 40 events — inflating dashboards, skewing baselines, and burning ingest quota you’re paying for by volume.

Defining event boundaries with regex

Multiline log parsing with regex needs one decision: how do you tell the parser where an event begins or ends? There are three patterns, and the right one depends on what your log source gives you.

Match the header line

The most dependable approach matches the first line of each event — usually a leading timestamp. Everything after it belongs to the current event until the next header line appears.

A timestamp anchored to the start of the line makes a reliable boundary:

/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/

This matches 2026-06-11 14:22:01 at the beginning of a line and treats every indented stack frame that follows as part of the same record. Most structured application logs lead with a timestamp, which makes this the default choice.

Match the header and end line

When both a reliable header and a distinct terminator are present, you can match both. The event opens on the header pattern and closes as soon as the end pattern matches — useful when the footer is more reliably distinct than waiting for the next header alone, for example, a closing </event> tag.

Fixed line count

If every event is the same number of lines, you can count instead of pattern-match. This one is brittle: one source change and your boundaries silently drift. Reserve it for rigid, machine-generated formats where the line count won’t change.

A header-only pattern can’t tell that a file’s last event is complete until the next header arrives. If you need the last record emitted promptly, add an end-line pattern or a fixed line count.

Regex that holds up in production

The pattern that works on three sample lines often falls over on a million. Three habits keep multiline regex dependable at scale:

Anchor your patterns

Start header and end patterns with ^. Without the caret, the engine searches for a match anywhere in the line — a costly operation on every line of a high-volume source. Forgetting the anchor is an easy mistake to make, and an expensive one.

Set the right flags

Once lines are joined into a single event string, . stops at the first newline unless you set the /s (dotall) flag, which lets . match line terminators too. Use /m (multiline) when you want ^ and $ to match at internal line breaks. Mixing these up is why a field-extraction pattern "works" on one line and returns nothing on a reassembled event.

Watch for catastrophic backtracking

Nested or ambiguous quantifiers like (.*)+ can send a regex engine into exponential backtracking. A short line hides the cost. A 2,000-line stack trace turns it into a CPU spike and a stalled pipeline. Write specific patterns, prefer explicit character classes over broad wildcards, and test against your largest real events, not your tidiest ones.

How NXLog Agent handles multiline parsing

NXLog Agent handles all three boundary strategies through one module, the Multiline Parser extension. You define the module once and point an input at it.

The module uses the PCRE engine, so the regex syntax matches what you already write in Perl: patterns quoted with slashes, the =~ and !~ operators, and the /s and /m modifiers described above.

Here’s a configuration that reassembles Java stack traces from a log file using a timestamp header:

<Extension multiline>
    Module        xm_multiline
    HeaderLine    /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/
</Extension>

<Input app_logs>
    Module        im_file
    File          '/var/log/app/application.log'
    InputType     multiline
</Input>

The HeaderLine directive sets the boundary, and InputType ties the File input module instance to the multiline instance. Need an end pattern too? Pair it with EndLine. Fixed-size records? Use FixedLineCount.

Once the event is whole, you can pull fields out of it. Add an Exec block to the input — it runs on the full, reassembled record, which is where /s does its work: it lets . match across the newlines you just joined.

<Exec>
    # Runs once per reassembled event
    if $raw_event =~ /^(\S+ \S+) (\w+) (.*)$/s
    {
        $EventTime = parsedate($1);
        $Severity = $2;
        $Message = $3;
    }
</Exec>

This is the part that matters for your SIEM: because processing is at the source level, that 40-line stack trace increments your event counter once, not forty times. NXLog Agent reassembles the event, so what reaches Splunk, Microsoft Sentinel, or Elasticsearch is a single record, already parsed and intact. The same approach applies whether you’re collecting from flat files, Windows Event Log, or a TCP listener.

Conclusion

Almost every application log you collect contains multiline events. Get the boundary right with a well-anchored regex, choose the strategy that fits your source, and reassemble events at the collection layer so your SIEM only ever sees whole records. Your detection rules, your correlation logic, and your ingest bill all benefit.

NXLog Agent supports 100+ input and output modules for SecOps, DevOps, and compliance pipelines, and NXLog Platform is how you deploy and manage agents across a fleet. To try this on your own logs, start with the Multiline Parser documentation, and you can try the full pipeline for free before you scale.

NXLog Platform is an on-premises solution for centralized log management with
versatile processing forming the backbone of security monitoring.

With our industry-leading expertise in log collection and agent management, we comprehensively
address your security log-related tasks, including collection, parsing, processing, enrichment, storage, management, and analytics.

Start free Contact us
  • Telemetry collection
  • Telemetry pipeline management
Share

Facebook Twitter LinkedIn Reddit Mail
Related Posts

Enterprise IIS log analysis software: top tools, use cases, and NXLog Agent integration
17 minutes | May 7, 2026
Making the most of Windows Event Forwarding for centralized log collection
6 minutes | December 17, 2018
DNS Log Collection on Windows
9 minutes | May 28, 2020

Stay connected:

Sign up

Keep up to date with our monthly digest of articles.

By clicking singing up, I agree to the use of my personal data in accordance with NXLog Privacy Policy.

Featured posts

Announcing NXLog Platform 1.13
June 9, 2026
Enterprise IIS log analysis software: top tools, use cases, and NXLog Agent integration
May 7, 2026
Announcing NXLog Platform 1.12
April 21, 2026
How to visualize telemetry data flow and volume with NXLog Platform
March 23, 2026
Security dashboards go dark: why visibility isn't optional, even when your defenses keep running
February 26, 2026
Building a practical OpenTelemetry pipeline with NXLog Platform
February 25, 2026
Announcing NXLog Platform 1.11
February 23, 2026
Adopting OpenTelemetry without changing your applications
February 10, 2026
Linux security monitoring with NXLog Platform: Extracting key events for better monitoring
January 9, 2026
2025 and NXLog - a recap
December 18, 2025
Announcing NXLog Platform 1.10
December 11, 2025
Announcing NXLog Platform 1.9
October 22, 2025
Gaining valuable host performance metrics with NXLog Platform
September 30, 2025
Security Event Logs: Importance, best practices, and management
July 22, 2025
Enhancing security with Microsoft's Expanded Cloud Logs
June 10, 2025

Categories

  • ANNOUNCEMENT
  • COMPARISON
  • COMPLIANCE
  • DEPLOYMENT
  • SECURITY
  • SIEM
  • STRATEGY
  • Products
  • NXLog Platform
  • NXLog Community Edition
  • Integration
  • Professional Services
  • Licensing
  • Plans
  • Resources
  • Documentation
  • Blog
  • White Papers
  • Videos
  • Webinars
  • Case Studies
  • Community Program
  • Community Forum
  • Compare NXLog Platform
  • Partners
  • Find a Reseller
  • Partner Program
  • Partner Portal
  • About NXLog
  • Company
  • Careers
  • Support Portals
  • Contact Us

Follow us

LinkedIn Facebook YouTube Reddit
logo

© Copyright NXLog Ltd.

Privacy Policy • General Terms of Business