to_json and special characters

Tags:

#1 vguyard

Hello,

I have a question regarding the xm_json module of nxlog-ce v2.10. I am sending windows logs to out syslog server and using json message with a BSD header like so:

<Input in_winlog>                                                          
  Module im_msvistalog                                                          
  <QueryXML>
    <QueryList>
      <Query Id="0">
        <Select Path="System">*</Select>
        <Select Path="Application">*</Select>
        <Select Path="Security">*</Select>
      </Query>
    </QueryList>
  </QueryXML>
  <Exec>
    $SyslogFacilityValue = syslog_facility_value("local1");
  </Exec>
</Input>

<Output out_syslog>
  Module        om_udp
  Host          10.10.231.62
  port          514
  <Exec>
    $Hostname = string(host_ip());
    $Keywords = string($Keywords);
    $Message = to_json();
    $Message =~ s/}$/,"field":"value"}\n/g;
    $Message =~ s/\\[r|n|t]/ /g;
    $Message =~ s/\s{2,}/ /g;
    to_syslog_bsd();
  </Exec>
</Output>

So on output I convert the message to json, then add an extra field to the end of it, then remove the \t, \r, \n characters in the message and finally cleanup the extra whitespaces left by the previous substitution. This has a side-effect of modifying any string that contain the \t, \t or \n character in it, typically the "A user DOMAIN\ruser1" string will be changed to "A user DOMAIN\ user1" (space after backslash) mangling the json string in the process. So to prevent this, I changed the output to the following:

<Output out_syslog>
  Module        om_udp
  Host          10.10.231.62
  port          514
  <Exec>
    $Hostname = string(host_ip());
    $Keywords = string($Keywords);
    $Message = replace($Message, "\r", " ");
    $Message = replace($Message, "\n", " ");
    $Message = replace($Message, "\t", " ");
    $Message = to_json();
    $Message =~ s/}$/,"field":"value"}\n/g;
    $Message =~ s/\\r\\n\\t\\t\\t/ /g;
    $Message =~ s/\s{2,}/ /g;
#    $Message =~ s/\\[r|n|t]/ /g;
    to_syslog_bsd();
  </Exec>
</Output>

This time doing the substitutions before converting to json. Using this configuration when the to_json(); is executed I see on eventID 4672 that the privilegelist field is populated along with a \r\n\t\t\t sequence. I would have though that the replace actions would have gotten rid of those, is this an expected behavior or am I doing this the wrong way?

For the moment I added $Message =~ s/\\r\\n\\t\\t\\t/ /g; to get rid of this specific sequence but how can I be sure that other messages are not affected with another sequence of tabulations and carriage return ?

Thanks for your time.

Vincent

#2 b0ti Nxlog ✓
#1 vguyard
Hello, I have a question regarding the xm_json module of nxlog-ce v2.10. I am sending windows logs to out syslog server and using json message with a BSD header like so: <Input in_winlog> Module im_msvistalog <QueryXML> <QueryList> <Query Id="0"> <Select Path="System">*</Select> <Select Path="Application">*</Select> <Select Path="Security">*</Select> </Query> </QueryList> </QueryXML> <Exec> $SyslogFacilityValue = syslog_facility_value("local1"); </Exec> </Input> <Output out_syslog> Module om_udp Host 10.10.231.62 port 514 <Exec> $Hostname = string(host_ip()); $Keywords = string($Keywords); $Message = to_json(); $Message =~ s/}$/,"field":"value"}\n/g; $Message =~ s/\\[r|n|t]/ /g; $Message =~ s/\s{2,}/ /g; to_syslog_bsd(); </Exec> </Output> So on output I convert the message to json, then add an extra field to the end of it, then remove the \t, \r, \n characters in the message and finally cleanup the extra whitespaces left by the previous substitution. This has a side-effect of modifying any string that contain the \t, \t or \n character in it, typically the "A user DOMAIN\ruser1" string will be changed to "A user DOMAIN\ user1" (space after backslash) mangling the json string in the process. So to prevent this, I changed the output to the following: <Output out_syslog> Module om_udp Host 10.10.231.62 port 514 <Exec> $Hostname = string(host_ip()); $Keywords = string($Keywords); $Message = replace($Message, "\r", " "); $Message = replace($Message, "\n", " "); $Message = replace($Message, "\t", " "); $Message = to_json(); $Message =~ s/}$/,"field":"value"}\n/g; $Message =~ s/\\r\\n\\t\\t\\t/ /g; $Message =~ s/\s{2,}/ /g; # $Message =~ s/\\[r|n|t]/ /g; to_syslog_bsd(); </Exec> </Output> This time doing the substitutions before converting to json. Using this configuration when the to_json(); is executed I see on eventID 4672 that the privilegelist field is populated along with a \r\n\t\t\t sequence. I would have though that the replace actions would have gotten rid of those, is this an expected behavior or am I doing this the wrong way? For the moment I added $Message =~ s/\\r\\n\\t\\t\\t/ /g; to get rid of this specific sequence but how can I be sure that other messages are not affected with another sequence of tabulations and carriage return ? Thanks for your time. Vincent

So on output I convert the message to json, then add an extra field to the end of it, then remove the \t, \r, \n characters in the message

When it is converted to json, the \t, \r, \n characters become an escape sequence.

Perhaps you are looking for something like this:

$Message = to_json();
$Message = replace($Message, '\t', " ");

Note that '\t' is two characters (i.e. the escape sequence for the tab character) and "\t" is a single character (i.e. the actual tab).