7
responses

Hello,

Here is my config on Windows machine, running nxlog-ce-2.9.1504

#define ROOT C:\Program Files\nxlog

define ROOT C:\Program Files (x86)\nxlog

Moduledir %ROOT%\modules

CacheDir %ROOT%\data

Pidfile %ROOT%\data\nxlog.pid

SpoolDir %ROOT%\data

LogFile %ROOT%\data\nxlog.log

###############

# Extensions  #

###############

<Extension syslog>

    Module    xm_syslog

</Extension>

<Extension json>

    Module    xm_json

</Extension>


###########

# Inputs  #

###########

<Input some_input>

    Module    im_file

    File    'C:\Logs\input.log'

    SavePos    TRUE

</Input>

###############

# Processors  #

###############

<Processor buffer>

    Module      pm_buffer

    # 1Gb disk buffer 1048576 kilo-bytes

    MaxSize    1048576

    Type    Disk

    Directory  C:\Logs\buffer    

</Processor>


############

# Outputs  #

############

<Output tcpout>

    Module    om_tcp

    Port    5170

    Host    fluentd.company.lan

</Output>

############

# Routes   #

############

<Route file>

    Path   some_input => buffer => tcpout

</Route>

Here's testing case initials:

1. Service ' fluentd.company.lan' is up and running and listens on 5170

2. nxlog up and running with given config

3. Data coming to input.log is successfully routed to output via buffer and is seen in Kibana

Then

1. I change 'C:\Windows\system32\drivers\etc\hosts' file and add '127.0.0.1 fluentd.company.lan' line, saving file

2. Using TCPView tool from SysInternals close current TCP connection with 'fluentd.company.lan:5170'

3. See in nxlog.log, that it tries to connect to 'fluentd.company.lan:5170' and fails to connect

4. Wait for some new data in input.log

5. New data arrived and I see buffer file created 'buffer.1.q' in C:\Logs\buffer and see relevant data in it

6. Wait for some time (2-3 minutes)

7. Again I change 'C:\Windows\system32\drivers\etc\hosts' file and comment '127.0.0.1 fluentd.company.lan' line, saving file

8. nxlog successfully connects to fluentd.company.lan:5170

And here's interesting part, nxlog writes new data found in input file, but I don't see logs in Kibana from buffer file with timestamps from intervals in point #6

Please check this case and make sure buffer is not working on Windows and fix this bug


 

 


 

 

 

AskedFebruary 19, 2016 - 12:00pm

Comments (7)

  • sa's picture

    Hi Konstantin, I took your config file, installed a vanilla 2.9.1504 version and tried to reproduce your problem -- without success. NXlog could easily resume its operation after the (artificial) target server outage and the buffer's contents was read and forwarded correctly, in a timely fashion.

  • Funbit's picture

    For me nxlog buffering is not working either. Checked 3 times in different (though similar) environments (both disk and mem buffers).

    Here is my nxlog configuration:

    <Input sys_in>
        Module  im_udp
        Port    516
    </Input>
    
    <Processor sys_buf>
        Module pm_buffer
        # 512MB buffer
        MaxSize    524288
        Type    Disk
        WarnLimit    65536
    </Processor>
    
    <Output sys_out>
        Module      om_tcp
        Host        log.ourdomain.internal
        Port        9291
    </Output>
    
    <Route sys>    
        Path        sys_in => sys_buf => sys_out
    </Route>

    The output endpoint is a logstash TCP listener behind an AWS load balancer (so log.ourdomain.internal has a CNAME record pointing to the ELB).

    If I stop logstash service, nxlog starts printing the following messages:

    2016-08-19 12:05:38 INFO reconnecting in 1 seconds
    2016-08-19 12:05:38 INFO connecting to log.ourdomain.internal:9291
    2016-08-19 12:05:38 INFO reconnecting in 1 seconds
    ...

    every second or so. After a minute of waiting I start logstash back, nxlog successfully connects to it but none of the previous log messages are sent... 

    The timeline looks like this: http://imgur.com/download/YkZqwZg

    I can confirm that sys_buf.1.q is created (when nxlog is started), but it has 0KB size, always.

    I'm not sure where to dig, the configuration looks correct for me.. Could you please help resolve the issue?

    PS. We are using Windows Server 2012 R2 + nxlog 2.9.1716 (we had same issue in previous versions) + logstash (2.3.4).

  • Funbit's picture

    Ok, I think I know what's going on... Looking at the Wireshark logs nxlog just continues to send all the logs to the opened TCP port (though, logstash is not running). Is there any way for nxlog to confirm that endpoint is actually accepting logs?

  • adm's picture
    (NXLog)

    Plain tcp is not fully reliable as there is no acknowledgement built into the protocol (i.e. Logstash does not send an OK). Data in socket buffers might be lost on a connection reset. Using pm_buffer will not protect against this.

    2016-08-19 12:05:38 INFO reconnecting in 1 seconds
    2016-08-19 12:05:38 INFO connecting to log.ourdomain.internal:9291

    The above logs suggest that logstash accepts the connection , probably reads some data and then closes the connection (or crashes).  This is exactly the case where you will end up with lost messages.

     

    The NXLog Enterprise Edition has a pair of modules that supports protocol level acknowlegdement to provide guaranteed delivery (though it is not supported with Logstash).

  • Funbit's picture

    Thank you very much for the answer, I figured it out practically as well..

    To avoid losing log messages I had to move from CNAME to A record pointing directly to logstash IP (because AWS ELB proxy always accepts connections even if target is "unhealthy"). I have to setup my own TCP proxy server that wouldn't accept incoming TCP requests if target server is unavailable (some messages might still be lost (5 seconds or so), but that's not a big deal for me).

  • Funbit's picture

    I totally forgot about another solution, hope it will help somebody.

    If you are using the following architecture on AWS:

    [nxlog] -> [logstash] -> [elasticsearch]

    and want to make logstash servers load balanced, it is better to use A domain record with multiple IP addresses rather than TCP elastic load balancer, because it will lose your messages if target servers are unhealthy, but DNS based IP balancing won't (if logstash server shuts down for example).

Answers (0)