Issue with Large CSV Messages Being Truncated in Graylog Using NXLog


#1 soc_nxlog (Last updated )

I am experiencing a problem with the NXLog Graylog sidecarwhere large CSV messages are being truncated when sent to Graylog via im_file and xm_gelf. The issue occurs regardless of whether I use UDP or TCP transport. Specifically, when a single field (FullFormattedMessage) exceeds approximately 11,000 bytes, the message gets truncated to 64 characters in Graylog's interface causing the entire large field to be omitted. This truncation appears to be due to the $ShortMessage field, which has a character limit.

Additionally, when the message is truncated, the $EventTime field, which is derived from a CSV column using parsedate(), does not parse correctly. It seems that when the message reaches its size limit, the $row_event is sent without executing any further processing. 

My current NXLog configuration is as follows:

define INSTALLDIR /etc/nxlog
define CERTDIR %INSTALLDIR%/cert
define CONFDIR %INSTALLDIR%/nxlog.d
define LOGDIR /var/log/nxlog
define MYLOGFILE %LOGDIR%/nxlogCSVtest.log

LogLevel INFO LogFile %MYLOGFILE%

<Extension csv> Module xm_csv Fields IpAddress,UserAgent,Key,CreatedTime,UserName,FullFormattedMessage </Extension>

<Extension gelf> Module xm_gelf </Extension>

<Input file> Module im_file File '/home/user/test/logs_to_send.csv' <Exec> csv->parse_csv(); if ($UserName =~ /USER_A/) drop();

    if (not defined($CreatedTime) or $CreatedTime=='' ) drop();
    

    $EventTime = parsedate($CreatedTime);
    $CreatedTime = undef;
    
    # These fields are needed for Graylog 
    $gl2_source_collector = '${sidecar.nodeId}';
    $collector_node_id = '${sidecar.nodeName}';
 &lt;/Exec&gt;

</Input>

<Output graylog_udp> Module om_udp Host 127.0.0.1 Port 5555 OutputType GELF_UDP </Output>

<Route 1> Path file => graylog_udp </Route>

 

The log file indicates an "Invalid CSV input" error for each row that is giving me the problem, but only displays the first 960 characters of the CSV rows in question.

2024-02-13 16:59:08 ERROR Invalid CSV input: <the first 960 characters of the csv row>
 

I am running NXLog CE on Ubuntu 22.04 with the Graylog sidecar, using package version nxlog-ce_3.2.2329_ubuntu22_amd64.

On the data that desn't get the error it's working great, I know i'm hitting some limitation but i would like to know witch one and how can i change it.

#2 soc_nxlog

Solved! The problem what that inside some FullFormattedMessage there are "C:\" , so it was escaping the dobule quotes, this made the parser lose track of the quotechar and was not getting a valid field.
I would add that this result is happening only using nxlog parse_csv. While debugging, any other tool i was using (Python csv , pandas, csv readers and so on) were having no issue with \" inside a field.

Anyway i've managed to fix it adding to the pre-parsing in python this: field.replace(r'\"', r'\\"')