Issues with data upload to ElasticSearch | Log collection solutions

#1 jppacheco 7 years ago

Hello everyone,

This is my first time posting in this community forum, so any help would be greatly appreciated.

I've been working with NxLog and ElasticSearch for a few months now and I've had mostly no issues with it until very recently, where a new ElasticSearch index was created in order to accomodate the new structure of our logs. With that, we also had to update our existing nxlog.conf file.

We have about 3 different ElasticSearch endpoints with the same setup that we currently work with, and at one point during the week we had run out of storage space. After increasing the storage size for all 3 endpoints, two environments appeared to continue sending new information up to ElasticSearch with no further problems. However, the third environment's NxLog services appear to be stuck in both existing AWS instances and newly created instances, repeating the following NxLog log entry over and over:

2016-04-29 15:33:12 INFO connecting to search-stage-logging-udf7h4lq2bsm245ciawp2stcvu.us-east-1.es.amazonaws.com:80
2016-04-29 15:33:12 INFO reconnecting in 1 seconds
2016-04-29 15:33:12 ERROR ### PANIC at line 2456 in module.c/nx_module_pollset_add_socket(): "failed to add descriptor to pollset: Not enough space ; [cannot dump backtrace on this platform]" ###

This was the log entry that we initially discovered that alerted us we had run out of space in ElasticSearch. However, the ElasticSearch dashboard does not show a lack of space anymore, so it's a bit confusing why NxLog would continue to output log entries in the other two environments

Basically, I have two questions:
1. Is this a type of scenario where the NxLog service's working state has been stuck unable to see there is space available? Or does the fault lie with ElasticSearch not showing storage space correctly?
2. If the Nxlog service has been stuck in this state, is there a configuration or some other automated procedure for NxLog to get the service to restart itself on multiple failures?

If anyone has gone through a similar experience, any tips would be greatly appreciated. Thank you for your time.

Permalink

#2 adm Nxlog ✓ 7 years ago

#1 jppacheco

Hello everyone, This is my first time posting in this community forum, so any help would be greatly appreciated. I've been working with NxLog and ElasticSearch for a few months now and I've had mostly no issues with it until very recently, where a new ElasticSearch index was created in order to accomodate the new structure of our logs. With that, we also had to update our existing nxlog.conf file. We have about 3 different ElasticSearch endpoints with the same setup that we currently work with, and at one point during the week we had run out of storage space. After increasing the storage size for all 3 endpoints, two environments appeared to continue sending new information up to ElasticSearch with no further problems. However, the third environment's NxLog services appear to be stuck in both existing AWS instances and newly created instances, repeating the following NxLog log entry over and over: 2016-04-29 15:33:12 INFO connecting to search-stage-logging-udf7h4lq2bsm245ciawp2stcvu.us-east-1.es.amazonaws.com:80 2016-04-29 15:33:12 INFO reconnecting in 1 seconds 2016-04-29 15:33:12 ERROR ### PANIC at line 2456 in module.c/nx_module_pollset_add_socket(): "failed to add descriptor to pollset: Not enough space ; [cannot dump backtrace on this platform]" ### This was the log entry that we initially discovered that alerted us we had run out of space in ElasticSearch. However, the ElasticSearch dashboard does not show a lack of space anymore, so it's a bit confusing why NxLog would continue to output log entries in the other two environments Basically, I have two questions: 1. Is this a type of scenario where the NxLog service's working state has been stuck unable to see there is space available? Or does the fault lie with ElasticSearch not showing storage space correctly? 2. If the Nxlog service has been stuck in this state, is there a configuration or some other automated procedure for NxLog to get the service to restart itself on multiple failures? If anyone has gone through a similar experience, any tips would be greatly appreciated. Thank you for your time.

Sounds like you have ran into a bug. The error "Not enough space" usually means that the process ran out of memory and it's not related to disk storage space. Can you check the memory usage in process monitor?