1
answer

We are using NXlog to send Windows EventLogs and IIS logs to Logstash (ELK).  It works very well indeed with only one problem - at 2am (we suspect IIS log rotation, or maybe weird Netscaler packet) every day the NXlog service crashes on ALL servers that are sending IIS logs.  We have some other servers only sending EventLogs - no crashing occurs.  I have set the Windows NXlog service to restart on a crash, so service is only interrupted for a minute, but here's the weird part - NXlog re-sends a (random?) huge chunk of messages.  I say 'random', because check out this list of time received vs. event time: (these are all from the same server)

  • 2014-10-28T06:00:54.074Z & 2014-10-25 04:18:08
  • 2014-10-28T06:00:54.074Z & 2014-10-25 03:58:08
  • 2014-10-28T06:00:54.072Z & 2014-10-25 03:20:02
  • 2014-10-28T06:00:54.058Z & 2014-10-25 03:10:33
  • 2014-10-28T06:00:54.058Z & 2014-10-25 03:18:10

The weird time received jumping forward/backward in time continues (perhaps it's an ELK stack processing oddity)...keeping in mind we're talking hundreds of entries over a 1-2 minute period.  On our UAT servers I observed that this large chunk of logs contains logs from the last week or more - but it's not consistent enough to say 'it's re-sending everything from the last week'.

Anyways - what causes the crash is bad input, like weird bad characters...I am sure the following copy/paste won't do it justice.  

If anyone has ideas, I'm open!!  Happy to either fix the weird log dumps, or fix the bad input data.  We're continuing to investigate, but I thought it important (and not Google-friendly) enough to post here.  A number of Google hits for questions like mine with no answers (or 'no bad input filtering, is a bug, sorry').

(disregard the timestamp)

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'PK


'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'U@¦Ä-*¸H¶
ÜH—ÙùPÜÙ¡žŒÛŠ”¹‘ðè²Þè'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input '¹'O‚¨ˆÌ(ˆÌÈF&È”™™ØÌDEéA3)B3éA3±¡é#eEff63AO‚È,ŠÈ,‚È,zÈ,ŠÈ,ld¢Ùé1³g3³&ŸI™½ 2:2KBõÔPD2£F‚³ËäL¡¢PÜÙÁ¢'

2014-10-22 02:51:39  cannot parse integer, invalid modifier: '+'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input '%6»Ô^È=ùÙý™y¨(wv°(#2Ñ¡³%ÍdlhúH±ýŽ•         .ê13'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input '2³Éòì'

2014-10-22 02:51:39  cannot parse integer, invalid modifier: '³'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input 'zÈ

‘©¨€Ê È́ÍLž'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'PY°‘                zb'ETè[1]ÈÏNÏÿºÿA£Ó@….€þ‹Ô~WO÷?G÷7]'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'ß;º'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'ŠÌlº…;;X”q‹'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input '>›™5LEptzÌ\(2sÁf&*Jš•'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'M°xU™•ÍL”‰'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 5 in input 'M´xUš•MMwóÀ†fM>“‑3-è1ÓL™fzÌ4Óc¦™ 3Íô˜i¦ÇÌÈf&š]dCô$ÈÌ$ÈÌ$ÈÌ$È̤ÈÌ$ÈÌÄf&È$h,ë13
2SQ[1]YaC3øH‚̤K °v…L?:º



 

 

AskedOctober 28, 2014 - 11:42am

Answer (1)

The reason you are getting a big batch of past data after the restart is because NXLog saves the file/eventlog position in the file configcache.dat on clean shutdown. When it crashes the file does not get saved and after the service is restarted is starts again resending messages that it has already read before the crash.

There is a feature in the Enterprise Edition which can force the file position to be flushed into the configcache.dat file periodically or even after each read.
While this is not a solution against potential crashes, it can help with the situation you are encountering.

If you can narrow down the cause of the crash, please report it in the Support ticketing system and attach nxlog.conf and input files that can trigger the crash.

AnsweredOctober 28, 2014 - 9:03pm

Comments (4)

  • chris.trotter.sci's picture

    Hm, thanks - good to know.  I'm hoping to get approval to do a packet capture and have also requested info on the Enterprise Edition pricing.  If all else fails we'll just have to write a scheduled task/script that stops the NXlog service at 1:59:30AM, restarts it at 2:00:30AM.  (all of the crash (bad) data is received within ~5 seconds of it starting)

    The below is from a single server's nxlog.log: (first line grabbed from each crash sequence - the 'PK' has an additional three wacky characters following that aren't pasting in)

    2014-10-25 02:00:01 ERROR if-else failed at line 176, character 342 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; procedure 'parse_csv' failed at line 176, character 170 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; Not enough fields in CSV input, expected 22, got 1 in input 'PK'
    2014-10-26 02:00:02 ERROR if-else failed at line 176, character 342 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; procedure 'parse_csv' failed at line 176, character 170 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; Not enough fields in CSV input, expected 22, got 1 in input 'PK'
    2014-10-27 02:00:02 ERROR if-else failed at line 148, character 342 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; procedure 'parse_csv' failed at line 148, character 170 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; Not enough fields in CSV input, expected 22, got 1 in input 'PK'
    2014-10-28 02:00:02 ERROR if-else failed at line 134, character 342 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; procedure 'parse_csv' failed at line 134, character 170 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; Not enough fields in CSV input, expected 22, got 1 in input 'PK'
    2014-10-29 02:00:01 ERROR if-else failed at line 176, character 342 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; procedure 'parse_csv' failed at line 176, character 170 in c:\Program Files (x86)\nxlog\conf\nxlog.conf. statement execution has been aborted; Not enough fields in CSV input, expected 22, got 7 in input 'PK'

     

    I would note that the nxlog.conf section referred to in each case (except for line 176 - the conf isn't that long) is this:

    Exec if $raw_event =~ /^#/ drop();          \
           else                                     \
           {                                        \
                w3c->parse_csv();                   \
        $EventTime = parsedate($date + " " + $time); \
                $SourceName = "IIS";                \
                $Message = to_json();               \
           } <-------------this is line 134 and 148

    </Input>

     

    October 29, 2014 - 12:16pm
  • chris.trotter.sci's picture

    Haha, okay, we figured it out.  The key is in the conf file..

        File    "D:\\LogFiles\\W3SVC2\\u_ex*"

    Seems okay, right?  Ya, well our logrotation creates a zip file called "u_ex*.zip"...inside that folder... Solution: Change the config file to:

        File    "D:\\LogFiles\\W3SVC2\\u_ex*.log"

    I'll know for sure by tomorrow (will update), but we're pretty sure this'll fix it.  When I previously tested the log rotation script, what I did NOT check for was 'did the script actually do anything?'.  So what happened here is that a new ZIP archive was created and NXlog tried to analyze it.

    October 29, 2014 - 4:04pm
  • adm's picture
    (NXLog)

    Thanks for the info. The crash was easy to reproduce by feeding a zip file to the csv parser.  It's caused by an off-by-one bug.

    The fix will be available in the next release.  Otherwise here is a patch:

    --- src/modules/extension/csv/csv.c
    +++ src/modules/extension/csv/csv.c
    @@ -427,7 +427,7 @@
         len = strlen(src);
         }
     
    -    tmpstr = malloc((size_t)len);
    +    tmpstr = malloc((size_t)len + 1);
     
         try
         {

     

    October 29, 2014 - 11:18pm
  • adm's picture
    (NXLog)

    This should be fixed in the latest release (nxlog-ce-2.9.1347)

    March 10, 2015 - 1:42pm