2
responses

Hello everyone,

This is my first time posting in this community forum, so any help would be greatly appreciated.

I've been working with NxLog and ElasticSearch for a few months now and I've had mostly no issues with it until very recently, where a new ElasticSearch index was created in order to accomodate the new structure of our logs. With that, we also had to update our existing nxlog.conf file.

We have about 3 different ElasticSearch endpoints with the same setup that we currently work with, and at one point during the week we had run out of storage space. After increasing the storage size for all 3 endpoints, two environments appeared to continue sending new information up to ElasticSearch with no further problems. However, the third environment's NxLog services appear to be stuck in both existing AWS instances and newly created instances, repeating the following NxLog log entry over and over:

2016-04-29 15:33:12 INFO connecting to search-stage-logging-udf7h4lq2bsm245ciawp2stcvu.us-east-1.es.amazonaws.com:80
2016-04-29 15:33:12 INFO reconnecting in 1 seconds
2016-04-29 15:33:12 ERROR ### PANIC at line 2456 in module.c/nx_module_pollset_add_socket(): "failed to add descriptor to pollset: Not enough space ; [cannot dump backtrace on this platform]" ###

This was the log entry that we initially discovered that alerted us we had run out of space in ElasticSearch. However, the ElasticSearch dashboard does not show a lack of space anymore, so it's a bit confusing why NxLog would continue to output log entries in the other two environments

Basically, I have two questions:
1. Is this a type of scenario where the NxLog service's working state has been stuck unable to see there is space available? Or does the fault lie with ElasticSearch not showing storage space correctly?
2. If the Nxlog service has been stuck in this state, is there a configuration or some other automated procedure for NxLog to get the service to restart itself on multiple failures?

 

If anyone has gone through a similar experience, any tips would be greatly appreciated. Thank you for your time.

AskedMay 2, 2016 - 8:37pm

Answer (1)

Sounds like you have ran into a bug.  The error "Not enough space" usually means that the process ran out of memory and it's not related to disk storage space. Can you check the memory usage in process monitor?

Comments (1)

  • horstp's picture

    I get a similar message at time. When it happens, my checkpoint connection fails and I have to restart nxlog.

    2017-11-21 12:53:27 ERROR apr_file_write failed;No space left on device
    2017-11-21 12:53:27 ERROR apr_file_write failed;No space left on device
    2017-11-21 12:53:27 ERROR ### PANIC at line 2467 in module.c/nx_module_pollset_add_file(): "failed to add descriptor to pollset: Bad file descriptor ;backtrace:;/opt/nxsec/bin/nxlog(nx_append_backtrace+0x66) [0x441842];/opt/nxsec/bin/nxlog(_nx_panic+0x15c) [0x417a0b];/opt/nxsec/bin/nxlog(nx_module_pollset_add_file+0x1bd) [0x425f43];/opt/nxsec/libexec/nxlog/modules/output/om_file.so(+0x319c) [0x7f9876d4d19c];/opt/nxsec/libexec/nxlog/modules/output/om_file.so(+0x3f20) [0x7f9876d4df20];/opt/nxsec/bin/nxlog(nx_event_process+0x212) [0x419a1b];/opt/nxsec/bin/nxlog() [0x46011b];/opt/nxsec/bin/nxlog() [0x4513f4];/lib64/libpthread.so.0() [0x324cc07aa1];/lib64/libc.so.6(clone+0x6d) [0x324c4e8bcd]" ###
    2017-11-21 12:53:27 ERROR apr_file_write failed;No space left on device
    2017-11-21 12:53:27 ERROR last message repeated 2 times
    2017-11-21 12:53:27 ERROR subprocess '29444' returned a non-zero exit value of 1

    This happens at random intervals, but it seems that every time it happens, I find those log entries in the nxlog logfile. Disk storage is fine, as is the inode count.