Reliable delivery of logs - can you trust TCP?

When considering your log collection strategy, a decision you have to make is which transport protocol to use to transfer logs from source to destination. The choice is often between the two most commonly used protocols, UDP (User Datagram Protocol) and TCP (Transfer Control Protocol). Which one to use depends on the type of logs you need to transfer, and whether performance or reliability is more important.

This blog post will compare these protocols, discuss why TCP is usually the preferred choice, and provide some options to further increase log delivery reliability with NXLog Enterprise Edition.

Protocol differences - TCP vs. UDP

The main difference between these protocols lies in how data packets move from one point to another. UDP is a connection-less protocol that does not guarantee the delivery of data packets to the destination; therefore, it is considered unreliable. On the other hand, TCP is a connection-oriented protocol designed to ensure the delivery of data packets to the destination, and so it is considered a reliable protocol.

Although UDP is an unreliable protocol, it has its uses when performance is preferred over reliability. UDP reduces the overhead of establishing a network session and can attain faster transmission speeds. Since it does not require receipt verification, it also reduces network load. These factors make it suitable for transferring high-volume, non-critical logs. UDP is usually preferred in monitoring applications to minimize overhead on the system you’re monitoring, or when having the latest logging data is of greater importance than having complete data.

When reliable data transfer is important, TCP is the natural choice since it ensures the delivery of packets to their destination. It is not the intent of this blog post to go into the technical details of how TCP works. However, in a nutshell, it achieves this reliability by performing what is known as a three-way handshake, where the client and server exchange a series of SYN (synchronize) and ACK (acknowledge) packets to initiate communication and establish a reliable connection for the actual data transfer to start. After sending data, the client waits a specific length of time for an acknowledgment. If it does not receive an acknowledgment from the server during that period, it will assume the packet was lost and retransmit it. When considering which protocol to use for log forwarding, TCP reliability is an important benefit to have.

Unreliable aspects of TCP

The original intent of the TCP protocol was to ensure that a connection is established and data is transferred reliably. However, you should be aware of some adverse effects when transferring log data over TCP.

The reliability-first approach of the TCP protocol creates some overhead on the system since it needs to create a connection and maintain its state, deliver packets in the correct order, ensure data integrity, and resend packages if required. Even so, it is still possible to lose data if the server prematurely closes the connection while it is transmitting data. In such cases, unsent data stored in the socket buffers will be lost.

Additionally, even though TCP can detect duplicate packets, there are instances where data duplication is possible. For example, this can happen if the ACK packets are lost and data is retransmitted. In other words, although TCP is designed to guarantee that each packet arrives only once, if the connection breaks, there is no way of knowing what exactly arrived.

In most cases, TCP is suitable for transferring logs because its benefits outweigh the disadvantages. Although TCP cannot guarantee that data loss or duplication will never happen, there are other ways to make your log transfer more reliable, and NXLog can help with this.

Improving log delivery reliability with NXLog

With NXLog, whether you choose to transfer your logs via TCP, UDP, or another protocol altogether, you gain access to functionality that helps you mitigate data loss or log duplication, which includes:

Methods to create buffers that allow syncing logs before forwarding them to their destination. The pm_buffer module offers both memory and disk-based buffering. They can even be combined to significantly reduce the risk of data loss due to a system crash or network failure.
Log duplication prevention is provided by the duplicate_guard() function. See Protection against duplication in the NXLog User Guide.
Addressing unreliability in the TCP protocol by using application-level acknowledgment. Two sets of modules are available for this purpose:
- The om_http / im_http modules guarantee message delivery over the network by sending logs (om_http) in an HTTP POST request. The server (im_http or a third-party HTTP server) responds with a status code to indicate if the data was received and processed successfully.
- The om_batchcompress / im_batchcompress modules also acknowledge receipt of data with the added benefit of data compression for faster transfer. In addition, these modules serialize fields so that structured data is preserved for later processing at the destination.
You can configure both options to transfer data securely using TLS/SSL.

Addressing log delivery reliability is required for compliance standards and certifications such as PCI-DSS, SOX, HIPAA, and ISO27001. With these features, you can tailor your log collection solution to ensure the timely delivery of logs while reducing the risk of data loss during transit. Whatever your requirements, NXLog can help you achieve the right balance between efficiency and reliability that best suits your needs. If you are interested in learning more about implementing these solutions, see Reliable message delivery in the NXLog User Guide.