Beyond basic ingestion: Advanced OpenTelemetry data processing with NXLog

Most discussions about OpenTelemetry pipelines focus on getting data from point A to point B. Collect telemetry, maybe convert the format, forward it to a backend. That’s the minimum viable pipeline, and it’s where most tooling stops.

But a pipeline that only moves data is a pipe, not a processing layer. The telemetry arriving at your observability platform or SIEM is only as useful as the context it carries. A raw log entry saying "connection from 198.51.100.42 to port 443" tells you something happened. The same entry enriched with geographic location, asset criticality, host performance metrics at the time of the event, and a classification tag transforms a data point into something an analyst can act on without switching between three other tools first.

This is where the gap between basic ingestion and operational usefulness lives. And it’s where processing — not just routing — determines whether your telemetry investment pays off.

The problem with "collect and forward"

A collect-and-forward pipeline treats telemetry as cargo. Pick it up, put it down somewhere else. The assumption is that the backend will handle everything: parsing, enrichment, correlation, classification. That assumption has three problems.

First, it pushes cost downstream. Observability platforms charge based on ingestion volume. Sending raw, unenriched data means your backend spends compute cycles on processing that could have happened closer to the source — and you pay per gigabyte for the privilege. Debug-level logs, duplicate events, and verbose attributes all contribute to volume without contributing to visibility.

Second, it delays insight. When enrichment happens at the backend, every query and dashboard depends on that backend’s processing capacity and enrichment capabilities. If your SIEM doesn’t support GeoIP lookups natively, you don’t get geographic context. If it can’t correlate system metrics with log events, you won’t know that the server was at 98% CPU when the connection dropped. The data existed at the source. It just didn’t make the trip.

Third, it creates vendor dependency for basic intelligence. When your backend handles all enrichment, switching platforms means rebuilding every enrichment rule, every lookup table, every classification scheme. The intelligence lives in the destination instead of the pipeline, and that locks you in.

Enrichment at the source changes the equation

Processing telemetry between collection and forwarding — at the agent or the collection layer — inverts this dynamic. Instead of sending raw material to the backend and hoping it knows what to do with it, you send pre-processed, contextually rich data that’s ready for analysis the moment it arrives.

NXLog Platform performs this processing at the collection point through its agent-side processing pipeline. The agent parses incoming telemetry, applies enrichment operations, and forwards the result — all before the data leaves the host. The output remains OpenTelemetry-compatible, so downstream systems receive enriched data in the same standardized format they expect.

What does "enrichment" mean in concrete terms? Three categories matter most.

Geographic and identity context

An IP address in a log event is a starting point, not an answer. Resolving that address to a city, country, and autonomous system number turns a connection event into an intelligence artifact. A firewall log entry showing traffic from 198.51.100.42 becomes traffic from São Paulo, Brazil, AS 12345. A security analyst triaging that alert at 3 AM doesn’t need to switch to an external lookup tool to decide whether it warrants escalation.

NXLog Agent supports GeoIP resolution during processing, attaching geographic metadata to events before they leave the host. The same approach applies to DNS resolution, hostname lookups, and user identity mapping — any contextual data that makes a raw event more immediately interpretable.

System-aware metadata

Telemetry events don’t happen in a vacuum. An application error logged at 14:32:07 means one thing if the host was idle and something different if the CPU was pegged at 95% and memory was exhausted. A connection timeout during normal load suggests a network issue. The same timeout during a resource spike suggests the server couldn’t handle the request.

Attaching host performance data — CPU utilization, memory pressure, disk I/O, network throughput — to log events at the time they occur creates a richer picture than either signal provides alone. When you see these metrics alongside the event in your analysis tool, you skip the step of manually correlating timestamps across separate dashboards. The answer is already in the event.

NXLog Agent collects host performance metrics and can attach them as attributes to outgoing telemetry. This enrichment happens locally, adding minimal overhead while significantly increasing the diagnostic value of each event.

Classification and pattern matching

Raw telemetry is noisy. Not every event deserves the same level of attention, and not every field arrangement maps neatly to the categories your security or operations team works with.

The NXLog Agent Pattern Matcher module classifies events against pattern databases during processing. An SSH authentication event gets tagged with a pattern name and ID. A Windows Security Event 4625 (failed logon) gets classified before it reaches the SIEM. This pre-classification means downstream detection rules can match against clean category tags rather than parsing raw message fields — which reduces rule complexity and improves match accuracy.

Pattern matching at the agent also catches events that would otherwise require multi-step processing at the backend. Rather than writing SIEM rules to parse, extract, classify, and then alert, the classification happens at collection time. The SIEM rule becomes a simple match against a pre-assigned field.

What this means during an incident

The value of pre-enriched telemetry shows up most clearly when you need it most: during active incident response.

Consider a security team investigating a suspected intrusion. The first alert shows an unusual outbound connection from an application server. With a basic pipeline, the analyst sees a source IP, a destination IP, a port, and a timestamp. To determine the severity, they need to look up the destination IP’s geographic location in a separate tool, check whether the source server was under unusual load, determine what application generated the connection, and cross-reference against recent vulnerability disclosures for that application.

With enriched telemetry, the event already carries the destination’s country and ASN, the host’s CPU and memory state at the time of the connection, the application name and version from host metadata, and a classification tag from pattern matching. The analyst’s first look at the alert contains enough context to make an informed triage decision — escalate, investigate further, or dismiss — without leaving their primary analysis tool.

This isn’t hypothetical optimization. The CISA advisory on the federal agency breach that went undetected for three weeks explicitly identified the failure to continuously review alerts as a root cause. Part of what makes continuous review difficult is alert volume combined with insufficient context. When every alert requires manual enrichment before an analyst can assess it, the queue grows faster than the team can process it. Pre-enriched events reduce the time per alert, which directly increases the number of alerts a team can review in a given window.

Enrichment without losing standardization

A common concern with processing telemetry in the pipeline is compatibility: if you modify events before forwarding, do they still conform to the expected schema? Will your backend still parse them correctly?

This is where the distinction between "modifying" and "enriching" matters. Enrichment adds context to events — new fields, attributes, or metadata — without altering the original data. The raw event fields remain intact. The schema stays valid. The additional context rides alongside the original payload as supplementary attributes.

NXLog Platform’s processing pipeline preserves OpenTelemetry compatibility through this approach. Events forwarded via OTLP carry their enrichment as resource attributes or log record attributes, following the OpenTelemetry data model. The receiving backend sees a standard OpenTelemetry event that happens to contain more context than a raw one. No custom parsers required. No schema violations.

This means enrichment at the collection layer doesn’t conflict with an organization’s broader OpenTelemetry standardization strategy. You can mandate OpenTelemetry format across your infrastructure and still get the benefits of pre-processed, contextually rich data.

The processing pipeline as a cost lever

Processing telemetry before forwarding also serves as a cost control mechanism, separate from the analytical benefits.

Observability platforms and SIEMs bill by volume — events per second, gigabytes per day, or both. Raw telemetry includes debug-level logs, routine health checks, informational messages, and duplicate events. A processing pipeline that filters low-value events, deduplicates repeated alerts, and aggregates high-frequency metrics reduces the data volume reaching your backend without reducing signal quality.

The math is straightforward. If 40% of your telemetry volume consists of debug logs, health checks, and duplicates that your security team never examines, filtering those at the collection layer reduces your ingestion bill by 40% while delivering the same operational visibility. The enrichment data you add to the remaining events is small — a few extra fields per event — compared to the volume you remove.

Beyond the basics

A telemetry pipeline that only moves data is table stakes. The organizations extracting real value from OpenTelemetry investments are the ones processing telemetry between collection and consumption: enriching events with geographic, system, and classification context; filtering noise before it reaches expensive backends; and preserving standardized formats throughout.

The difference between raw telemetry and enriched telemetry is the difference between data and answers. One fills dashboards. The other clears the fog during the moments that matter.

NXLog Platform is an on-premises solution for centralized log management with
versatile processing forming the backbone of security monitoring.

With our industry-leading expertise in log collection and agent management, we comprehensively
address your security log-related tasks, including collection, parsing, processing, enrichment, storage, management, and analytics.

Start free Contact us