nxlog-ce-3.0.2284 crashing randomly after upgrading from 2.10.2150
Hello,
I am having stability issues with the newest nxlog-ce release 3.0.2284. I have been using nxlog-ce-2.10.2150 for several years and it has been very stable in our environment with few issues. I use papertrail for log collection, and I have a highly customized configuration file. I did a test roll-out of 3.0.2284 to a few servers and did not notice any issues at first. However after rolling out the update to approximately 40 servers it started crashing randomly in ntdll.dll causing the nxlog service to stop and re-start itself. There was no rhyme or reason to it. It would work fine for 15 minutes and then suddenly I would start getting multiple random crashes and service restarts which would then crush papertrail with the previous 30 days of event log history (per crashed server) and my papertrail storage utilization doubled my average daily usage in a few hours. I had to roll-back all the servers back to 2.10.2150 to stop the bleeding. The servers used in the test were a variety of Windows 2012r2, 2016, 2019, and 2022 servers. By far the majority of the servers are running server 2016. Some are hyper-v hosts running on bare metal, others are virtual machines that run on those hyper-v hosts.
My papertrail logs are full of these errors, but here is a small sampling. These are server 2019 and 2022 respectively.
Oct 10 21:31:21 hv-host19-f4 Application-Error
{
"Message": "Faulting application name:nxlog.exe, version:0.0.0.0, time stamp:0x00000000|Faulting module name:libssl-1_1-x64.dll, version:1.1.1.13, time stamp:0x00000000|Exception code:0xc0000005|Fault offset:0x0000000000021b97|Faulting process id:0xf5d0|Faulting application start time:0x01d8dd0fc8bd493e|Faulting application path:c:\\apps\\nxlog\\nxlog.exe|Faulting module path:c:\\apps\\nxlog\\libssl-1_1-x64.dll|Report Id:08b0fd02-0e57-44b0-81fc-b1e7fb47f472|Faulting package full name:|Faulting package-relative application ID:",
"Hostname": "hv-host19-f4",
"EventType": "ERROR",
"SeverityValue": 4,
"Severity": "ERROR",
"EventID": 1000,
"SourceName": "Application-Error",
"Task": 100,
"RecordNumber": 12236,
"ProcessID": 0,
"ThreadID": 0,
"Channel": "Application",
"EventTime": "2022-10-10 21:23:29",
"Category": "Application Crashing Events",
"Opcode": "Info"
}
Oct 10 21:52:21 ws-ops22-2 nxlog-ce nxlog-ce-3.0.2284 startup profile 2022.01.25 (DEFAULT)
Oct 10 21:52:22 ws-ops22-2 nxlog-ce connecting to logs99.papertrailapp.com:12345
Oct 10 21:52:22 ws-ops22-2 nxlog-ce successfully connected to logs99.papertrailapp.com:12345
Oct 10 21:52:23 ws-ops22-2 Application-Error
{
"Message": "Faulting application name:nxlog.exe, version:0.0.0.0, time stamp:0x00000000|Faulting module name:ntdll.dll, version:10.0.20348.803, time stamp:0xbee6f04c|Exception code:0xc0000374|Fault offset:0x00000000001044a9|Faulting process id:0x26f4|Faulting application start time:0x01d8ca290a171970|Faulting application path:c:\\apps\\nxlog\\nxlog.exe|Faulting module path:C:\\WINDOWS\\SYSTEM32\\ntdll.dll|Report Id:5e5549b8-3c8b-405a-a78f-fd4c1f296a40|Faulting package full name:|Faulting package-relative application ID:",
"Hostname": "ws-ops22-2",
"EventType": "ERROR",
"SeverityValue": 4,
"Severity": "ERROR",
"EventID": 1000,
"SourceName": "Application-Error",
"Version": 0,
"Task": 100,
"OpcodeValue": 0,
"RecordNumber": 1100,
"ProcessID": 0,
"ThreadID": 0,
"Channel": "Application",
"EventTime": "2022-09-16 21:30:36",
"Category": "Application Crashing Events",
"Opcode": "Info"
}
Oct 10 21:54:35 ws-ops22-2 Application-Error
{
"Message": "Faulting application name:nxlog.exe, version:0.0.0.0, time stamp:0x00000000|Faulting module name:libcrypto-1_1-x64.dll, version:1.1.1.13, time stamp:0x00000000|Exception code:0xc0000005|Fault offset:0x00000000001ba014|Faulting process id:0x30e8|Faulting application start time:0x01d8caef9eba1346|Faulting application path:c:\\apps\\nxlog\\nxlog.exe|Faulting module path:c:\\apps\\nxlog\\libcrypto-1_1-x64.dll|Report Id:12281218-b154-47ae-a426-1495de2adf0d|Faulting package full name:|Faulting package-relative application ID:",
"Hostname": "ws-ops22-2",
"EventType": "ERROR",
"SeverityValue": 4,
"Severity": "ERROR",
"EventID": 1000,
"SourceName": "Application-Error",
"Version": 0,
"Task": 100,
"OpcodeValue": 0,
"RecordNumber": 1304,
"ProcessID": 0,
"ThreadID": 0,
"Channel": "Application",
"EventTime": "2022-09-17 20:00:44",
"Category": "Application Crashing Events",
"Opcode": "Info"
}
The crashing seems to indicate an issue with TLS or crypto but my existing papertrail configuration has been working fine for literally years.
Another issue I ran into, while removing nxlog-ce-3.0.2284 is that issuing a stop-service to command to the service returns "The pipe has been ended" error instead of a normal service shutdown gracefully message. This happened every time I tried to stop the service. The service did stop, but given the error I don't know if it was a graceful stop or if it was a hard stop that ends up causing the eventlog to be re-uploaded in its entirety when the service started again. I had a lot of that going on so I can't say for sure if it happened or not.
[SC] ControlService FAILED 109:
The pipe has been ended.
Finally here is a snippet of the bottom of my nxlog.conf file where I set up the connection to papertrail. I've changed the host parameters slightly for security.
<Route nxlog>
Path from_nxlog => to_papertrail
</Route>
<Route eventlogs>
Path from_eventlog => noisefilter => cleanup => reorder => jsonify => to_papertrail
</Route>
<Route c_logs>
Path from_c_logs => to_papertrail
</Route>
<Output to_papertrail>
Module om_ssl
Host logs99.papertrailapp.com
Port 12345
CAFile %ROOT%/cert/papertrail-bundle.pem
AllowUntrusted FALSE
# Convert to syslog format
Exec to_syslog_bsd();
</Output>
I'm considering pushing the logs to a local linux server with om_udp and let that server relay the logs to papertrail over TLS to workaround the issue but that adds extra complexity to the environment that I would rather not have to support.
Thanks
Ron
After running nxlog 3.0.2284 on a single server for a few days, sending the logs to a linux box to collect them before sending to papertrail, I got a different crash this morning.
Oct 14 11:31:27 ctx719-2016-1 Application-Error
{
"Message": "Faulting application name:nxlog.exe, version:0.0.0.0, time stamp:0x00000000|Faulting module name:im_msvistalog.dll, version:0.0.0.0, time stamp:0x00000000|Exception code:0xc0000005|Fault offset:0x000000000000456c|Faulting process id:0x7e24|Faulting application start time:0x01d8dd1cc9260401|Faulting application path:c:\\apps\\nxlog\\nxlog.exe|Faulting module path:C:\\apps\\nxlog\\modules\\input\\im_msvistalog.dll|Report Id:e37da0fd-4f3e-4bd0-8081-e36d96cd4ee7|Faulting package full name:|Faulting package-relative application ID:",
"Hostname": "ctx719-2016-1",
"EventType": "ERROR",
"SeverityValue": 4,
"Severity": "ERROR",
"EventID": 1000,
"SourceName": "Application-Error",
"Task": 100,
"RecordNumber": 272526,
"ProcessID": 0,
"ThreadID": 0,
"Channel": "Application",
"EventTime": "2022-10-14 10:54:33",
"Category": "Application Crashing Events",
"Opcode": "Info"
}
Just curious if anyone is actually using 3.0.2284 for real? 2.10.2150 never crashes like this.