We are currently using im_mark to generate heartbeat events in order for our monitoring to prove that log flow is operational. On several source machines we are having issues where the heartbeats do not appear to be generated in a timely manner, causing significant quantities of false alerts.

The configuration uses im_mark to generate a mark event at (currently) 5 minute intervals which is then used in two routes. The first sends it to the destination along with the log data where it is used by the monitoring software, the second writes it out to a file for debug purposes. On affected machines this file is reporting that (usually) two heartbeats are being generated at 5 minute intervals before a (apparently random) delay of between 15 minutes to approximately 80 minutes. During these periods the service continues running correctly and log data is submitted.

Attempted resolutions: Restart service. No effect. Remove the configcache.dat file. No effect Increased the generation time from 1 to 5 minutes. Issue appeared to go but returned on different machines after a week or so. Increased the number of threads in the configuration. No effect. Tested both the raw Windows API (WaitForSingleObject) and Apache Portable Runtime apr_thread_cond_timedwait methods with a simple test program. Issue was not evident.

Please let me know if you require any additional information.

AskedFebruary 1, 2018 - 4:57am

Answers (0)