Empty fields omitted in JSON conversion
This is a complex question about a complex problem, but please feel free to read i anyway :-)
We use NXLog to read the Windows eventlog and also csv files. We send the data to a linux loghost which does some regexp-based parsing.
We now encounter problems with missing fields.
Example 1: A csv file with three columns A, B and C. It looks like this:
#A,B,C
a,b,c
1,2,3
x,y,z
NXLog reads this file, uses an xm_csv module to parse the content, uses an xm_json module to convert it to JSON, uses an xm_syslog module to further convert it to syslog and finally sends it to the syslog server. At first glance this works fine. Here is the result from the syslog server:
...SourceModuleType:im_file,A:a,B:b,C:c,Hostname:...
...SourceModuleType:im_file,A:1,B:2,C:3,Hostname:...
...SourceModuleType:im_file,A:x,B:y,C:z,Hostname:...
However, as soon as we have empty values in a csv row, we run into problems:
#A,B,C
a,b,
1,,3
,y,
leads to:
...SourceModuleType:im_file,A:a,B:b,Hostname:...
...SourceModuleType:im_file,A:1,C:3,Hostname:...
...SourceModuleType:im_file,B:y,Hostname:...
All the fields that are empty in the csv file are now absent in the syslog message. (And this is a huge issue for our regexp parser.)
Interestingly, this:
#A,B,C
"a","b",""
1,"",3
,y,""
leads to:
...SourceModuleType:im_file,A:a,B:b,C:,Hostname:...
...SourceModuleType:im_file,A:1,B:,C:3,Hostname:...
...SourceModuleType:im_file,B:y,C:,Hostname:...
So it looks like NXLog treats an empty string in a different way than "nothing". (However, this is of limited value, as we are dealing with csv files created by applications, such as Exchange Server logfiles.)
The same behaviour not only applies to csv-based file inputs but also to the Windows eventlog input.
Example 2: Windows security log
Windows event 4624 (successful login) includes the two fields "TargetUserName" and "TargetDomainName". If users log in to a system using "DOMAIN\username" as their username, everything works fine:
...,TargetUserName:Administrator,TargetDomainName:DEMO,TargetLogonId:...
However, if a user uses the UPN (user.name@domain.org) to log in, Windows writes the UPN into the "TargetUserName" field and leaves the TargetDomainField empty. This results in:
...,TargetUserName:Administrator@demo.local,TargetLogonId:...
The "TargetDomainName" field is missing.
I have already spent a lot of time to troubleshoot this issue, but still haven't found THE solution. This is what I found out so far:
- The
parse_csv()
function of the xm_csv extension module does or does not create an NXLog field for each value in each row. If there is a value, such as in1,2,3
a field with the respective value is generated. For empty strings, such as in1,"",3
a field is generated as well, with an empty string as its value. But for "nothing", such as in1,,3
no field is generated, and this seems to be the root cause of our problem. - Both
to_json()
andto_kvp()
add all existing NXLog fields to the message, even the ones having "undef" values. But of course, fields that don't exist do not appear in the message. - I could not find a way to distinguish between an NXLog field that is present but has an "undef" value and a field that is not present. The
if defined($A)
construct returns false in both cases. - There is a (not so elegant) solution for the problem that applies to csv files only: Before calling
parse_csv()
all fields can be initialized manually, like this:$A = ""; $B = ""; $C = ""; parse_csv();
However, this does not apply to the Windows eventlog input, because the fields differ between Windows event ids.
So finally, the questions:
- Does anybody have a (config-based) solution for this problem?
- Is a change in NXLog behaviour needed to resolve the root cause? (I hope NXLog staff is reading this post.)