IIS log & bad characters causes NXlog crash causing random event dump


#1 chris.trotter.sci

We are using NXlog to send Windows EventLogs and IIS logs to Logstash (ELK).  It works very well indeed with only one problem - at 2am (we suspect IIS log rotation, or maybe weird Netscaler packet) every day the NXlog service crashes on ALL servers that are sending IIS logs.  We have some other servers only sending EventLogs - no crashing occurs.  I have set the Windows NXlog service to restart on a crash, so service is only interrupted for a minute, but here's the weird part - NXlog re-sends a (random?) huge chunk of messages.  I say 'random', because check out this list of time received vs. event time: (these are all from the same server)

  • 2014-10-28T06:00:54.074Z & 2014-10-25 04:18:08
  • 2014-10-28T06:00:54.074Z & 2014-10-25 03:58:08
  • 2014-10-28T06:00:54.072Z & 2014-10-25 03:20:02
  • 2014-10-28T06:00:54.058Z & 2014-10-25 03:10:33
  • 2014-10-28T06:00:54.058Z & 2014-10-25 03:18:10

The weird time received jumping forward/backward in time continues (perhaps it's an ELK stack processing oddity)...keeping in mind we're talking hundreds of entries over a 1-2 minute period.  On our UAT servers I observed that this large chunk of logs contains logs from the last week or more - but it's not consistent enough to say 'it's re-sending everything from the last week'.

Anyways - what causes the crash is bad input, like weird bad characters...I am sure the following copy/paste won't do it justice.  

If anyone has ideas, I'm open!!  Happy to either fix the weird log dumps, or fix the bad input data.  We're continuing to investigate, but I thought it important (and not Google-friendly) enough to post here.  A number of Google hits for questions like mine with no answers (or 'no bad input filtering, is a bug, sorry').

(disregard the timestamp)

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'PK


'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'U@¦Ä-*¸H¶
ÜH—ÙùPÜÙ¡žŒÛŠ”¹‘ðè²Þè'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input '¹'O‚¨ˆÌ(ˆÌÈF&È”™™ØÌDEéA3)B3éA3±¡é#eEff63AO‚È,ŠÈ,‚È,zÈ,ŠÈ,ld¢Ùé1³g3³&ŸI™½ 2:2KBõÔPD2£F‚³ËäL¡¢PÜÙÁ¢'

2014-10-22 02:51:39  cannot parse integer, invalid modifier: '+'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input '%6»Ô^È=ùÙý™y¨(wv°(#2Ñ¡³%ÍdlhúH±ýŽ•         .ê13'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input '2³Éòì'

2014-10-22 02:51:39  cannot parse integer, invalid modifier: '³'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 2 in input 'zÈ

‘©¨€Ê È́ÍLž'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'PY°‘                zb'ETè[1]ÈÏNÏÿºÿA£Ó@….€þ‹Ô~WO÷?G÷7]'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'ß;º'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'ŠÌlº…;;X”q‹'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input '>›™5LEptzÌ\(2sÁf&*Jš•'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 1 in input 'M°xU™•ÍL”‰'

2014-10-22 02:51:39  Not enough fields in CSV input, expected 22, got 5 in input 'M´xUš•MMwóÀ†fM>“‑3-è1ÓL™fzÌ4Óc¦™ 3Íô˜i¦ÇÌÈf&š]dCô$ÈÌ$ÈÌ$ÈÌ$È̤ÈÌ$ÈÌÄf&È$h,ë13
2SQ[1]YaC3øH‚̤K °v…L?:º



 

 

#2 adm Nxlog ✓
#1 chris.trotter.sci
We are using NXlog to send Windows EventLogs and IIS logs to Logstash (ELK).  It works very well indeed with only one problem - at 2am (we suspect IIS log rotation, or maybe weird Netscaler packet) every day the NXlog service crashes on ALL servers that are sending IIS logs.  We have some other servers only sending EventLogs - no crashing occurs.  I have set the Windows NXlog service to restart on a crash, so service is only interrupted for a minute, but here's the weird part - NXlog re-sends a (random?) huge chunk of messages.  I say 'random', because check out this list of time received vs. event time: (these are all from the same server) 2014-10-28T06:00:54.074Z & 2014-10-25 04:18:08 2014-10-28T06:00:54.074Z & 2014-10-25 03:58:08 2014-10-28T06:00:54.072Z & 2014-10-25 03:20:02 2014-10-28T06:00:54.058Z & 2014-10-25 03:10:33 2014-10-28T06:00:54.058Z & 2014-10-25 03:18:10 The weird time received jumping forward/backward in time continues (perhaps it's an ELK stack processing oddity)...keeping in mind we're talking hundreds of entries over a 1-2 minute period.  On our UAT servers I observed that this large chunk of logs contains logs from the last week or more - but it's not consistent enough to say 'it's re-sending everything from the last week'. Anyways - what causes the crash is bad input, like weird bad characters...I am sure the following copy/paste won't do it justice.   If anyone has ideas, I'm open!!  Happy to either fix the weird log dumps, or fix the bad input data.  We're continuing to investigate, but I thought it important (and not Google-friendly) enough to post here.  A number of Google hits for questions like mine with no answers (or 'no bad input filtering, is a bug, sorry'). (disregard the timestamp) 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'PK ' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'U@¦Ä-*¸H¶ ÜH—ÙùPÜÙ¡žŒÛŠ”¹‘ðè²Þè' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 2 in input '¹'O‚¨ˆÌ(ˆÌÈF&È”™™ØÌDEéA3)B3éA3±¡é#eEff63AO‚È,ŠÈ,‚È,zÈ,ŠÈ,ld¢Ùé1³g3³&ŸI™½ 2:2KBõÔPD2£F‚³ËäL¡¢PÜÙÁ¢' 2014-10-22 02:51:39 cannot parse integer, invalid modifier: '+' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input '%6»Ô^È=ùÙý™y¨(wv°(#2Ñ¡³%ÍdlhúH±ýŽ• .ê13' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 2 in input '2³Éòì' 2014-10-22 02:51:39 cannot parse integer, invalid modifier: '³' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 2 in input 'zÈ ‘©¨€Ê È́ÍLž' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'PY°‘ zb'ETè[1]ÈÏNÏÿºÿA£Ó@….€þ‹Ô~WO÷?G÷7]' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'ß;º' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'ŠÌlº…;;X”q‹' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input '>›™5LEptzÌ\(2sÁf&*Jš•' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 1 in input 'M°xU™•ÍL”‰' 2014-10-22 02:51:39 Not enough fields in CSV input, expected 22, got 5 in input 'M´xUš•MMwóÀ†fM>“‑3-è1ÓL™fzÌ4Óc¦™ 3Íô˜i¦ÇÌÈf&š]dCô$ÈÌ$ÈÌ$ÈÌ$È̤ÈÌ$ÈÌÄf&È$h,ë13 2SQ[1]YaC3øH‚̤K °v…L?:º    

The reason you are getting a big batch of past data after the restart is because NXLog saves the file/eventlog position in the file configcache.dat on clean shutdown. When it crashes the file does not get saved and after the service is restarted is starts again resending messages that it has already read before the crash.

There is a feature in the Enterprise Edition which can force the file position to be flushed into the configcache.dat file periodically or even after each read.
While this is not a solution against potential crashes, it can help with the situation you are encountering.

If you can narrow down the cause of the crash, please report it in the Support ticketing system and attach nxlog.conf and input files that can trigger the crash.