Parse log with unicode characters hanging out


#1 cschelin

I'm attempting to parse a Cerberus FTP log file. What I wind up with:

{
  "EventReceivedTime": "2024-08-01 16:11:37",
  "SourceModuleName": "cerberus_log",
  "SourceModuleType": "im_file",
  "message": "[\u00002\u00000\u00002\u00004\u0000-\u00000\u00008\u0000-\u00000\u00001\u0000 \u00001\u00006\u0000:\u00001\u00001\u0000:\u00003\u00006\u0000]\u0000:\u0000C\u0000O\u0000N\u0000N\u0000E\u0000C\u0000T\u0000 \u0000[\u00001\u00005\u00002\u00004\u00009\u00002\u0000]\u0000 \u0000-\u0000 \u0000C\u0000o\u0000n\u0000n\u0000e\u0000c\u0000t\u0000i\u0000o\u0000n\u0000 \u0000t\u0000e\u0000r\u0000m\u0000i\u0000n\u0000a\u0000t\u0000e\u0000d\u0000"
}

I've tried this, to no avail:

<Input cerberus_log>
  Module im_file
  File "C:\ProgramData\Cerberus LLC\Cerberus FTP Server\log\server.1.log"
  <Exec>
    $message = convert($raw_event, "utf-8", "iso8859-2");    if $message =~ s/(.)\\u0000// $message = $1;
    to_json();
  </Exec>
</Input>

How can I properly parse the log to remove the \u0000 characters before it goes out?

#2 cschelin (Last updated )

Additional note: when I use this instead:

$message = replace($message, "\u0000", "");

I get this:

{"EventReceivedTime":"2024-08-05 15:13:23","SourceModuleName":"cerberus_log","SourceModuleType":"im_file","message":"["}

In fact, that result happens no matter what I put in the replace-this string position.