If you are capturing Windows Event Logs on a large scale, you know that the more logs you collect, the more resources you need. Thus, the more expensive your SIEM becomes. The main issue is a large amount of the log data you are sending to your SIEM contains no valuable information. This means you waste a sizable portion of your cost on what the industry calls “log noise”.
This post will focus on events collected from Windows Event Log, one of the most common log sources. I know there are many other sources, but this source likely exists in about 99% of corporations, so we’ll use this as an example.
What is Windows Event Log?
The Windows Event Log captures system, security, and application logs on Windows operating systems. It serves as a repository of detailed events generated by the system and is the first resource IT administrators refer to when troubleshooting issues. Besides resolving problems, you can use Windows events to monitor and analyze security-related activities and satisfy compliance mandates.
You can examine logs in Windows Event Log in the Event Viewer MMC snap-in included in Windows. Since the stored logs do not have the entire message (only the event properties are stored), Event Viewer inserts property values into a corresponding localized template to display the full event details.
What is log noise?
Log noise is simply a term for log events that are low value and should not be collected. Such low-value events can severely hamper security analysts' timely access to the most critical security events when the ratio of high-value events to log noise is so low. With the large volume of logs being collected, there is cause for concern that companies are not only collecting too many logs but also neglecting to collect the very logs that would be most useful for monitoring security-related events.
Ironically, many agree that the more events collected, the better. This couldn’t be further from the truth since unfiltered logging unnecessarily burdens business operations instead of safeguarding them. In reality, most logs collected when events are not filtered are low-value events of little use to security analysts.
Why optimize log data?
Does size matter? - How does it affect the log journey?
I am sure it is not surprising, but the log data’s size and state significantly affect how it can be further processed. I would like to name three categories that I consider the most important.
SIEM cost
Here we come striking on one of the most painful aspects. You are probably aware that most SIEM vendors charge their customers by some kind of data ingestion variable. In some cases, it is the number of events ingested. In other cases, it is the amount of data consumed. I am not saying it is terrible, as they base their charging on consumption, but it can be challenging to cope with if poorly managed or unmanaged.
Network load
It may sound less striking but equally important as the previous one, as it also indirectly affects cost. Besides that, loading your network with unnecessary traffic is a bad practice. It can overload your network devices, leaving no room for other important and timely traffic flow. As mentioned, it could affect the cost, too, as in extreme cases, you may need to upgrade network equipment.
Manageability
Unoptimized log data full of log noise and unnecessary data is very inefficient when getting something useful out of them. It makes reporting and alerting cumbersome, which can cause false alerts. Regardless of the destination of the log data, it must be filtered, trimmed, and optimized for the best results.
How to optimize log data?
OK, let’s get down to business.
I will only discuss log optimization in the context of cost, network traffic, and manageability. |
What logs you choose to keep and what you decide to discard from your log data is not for me to decide. It will come down to your network policy in effect, but even if you have to keep a lot of it or all of it, you can do things to keep it smaller and shift it swifter through the network. |
So far, you know that you have to optimize your log data, but when and where? Ideally, as a golden rule, the earlier, the better. Optimizing logs at the source, or, at the very latest, before they leave your network or enter your SIEM or long-term storage, is the best idea. In any case, if you have the right tool, it is best done at the source, and it is best to collect only what you need and not even bother processing the rest. The more unnecessary data you collect, the more you have to shift through your network along the journey.
Let’s take a look at the different options and what they mean:
- Log filtering
-
Depending on your business requirements, it might be feasible to filter out entire log events when there is no valuable information in the log record or, for example, when the log event is duplicated. The common thing is to base your filtering on severity. Informational logs are unlikely to hold valuable information at all. You can configure NXLog Enterprise Edition to drop unwanted log events by defining a set of attributes they will match. Any matching events will trigger the drop procedure, leaving only the events that security analysts need.
- Log trimming
-
Trimming events refers to reducing log size by removing unwanted data comprised of fields containing low-value information and redundant or duplicated fields. You can easily configure this feature by specifying an "allowlist" of fields to keep. All other event fields will be removed, thus reducing the volume of data processed and forwarded.
- Compressing logs
-
You can also compress the log data before transmission to reduce network bandwidth usage. Consequently, log data is transmitted faster, and disk storage requirements are reduced if the endpoints are writing the logs to file. This is especially useful if the destination is long-term storage.
Let’s see how it looks in real life
Now that you have all the background information and concepts in place, it is time to get your hands dirty. Let’s see what you need to do in practice.
- Filtering logs collected from Windows Event Log
-
Well, this one is straightforward. You discard entire log entries you do not need. Size comparison is difficult to give as it largely depends on how many of the logs you want to keep. For this example, I will take the log entries from the system channel from the computer I am writing this article. It is a Windows 10 laptop I primarily use for work.
At the time of writing, I have 26685 events in the System channel: mainly information, but some warnings and errors. I will execute this example by first collecting all the logs with NXLog Enterprise Edition and saving them in JSON format; then, I will do the same but discard all events with the severity level of Information and Verbose. This method resembles a real-life scenario when you would filter your logs based on severity. Severity is one criterion, but a prevalent one to eliminate unwanted logs. You can read more about filtering logs in the Filtering logs section of the NXLog User Guide.
This simple configuration uses the im_msvistalog module to collect everything from the system log channel of Windows Event Log. The logs are then formatted as JSON.
You may notice that I will use the SavePos and ReadFromLast directives with the im_msvistalog module. You will unlikely need to use these in a real-life scenario when you collect logs as a continuous stream. However, they are great when testing and when you want to read the entire content of a log channel from a Windows Event Log in one go. nxlog.conf<Extension syslog> (1) Module xm_json </Extension> <Input eventlog> (2) Module im_msvistalog SavePos FALSE ReadFromLast FALSE <QueryXML> <QueryList> <Query Id="0" Path="System"> <Select Path="System">*</Select> </Query> </QueryList> </QueryXML> Exec to_json(); </Input>
1 The xm_json module formats the logs in JSON format. The input module calls it with the to_json(); procedure. 2 In this configuration, the im_msvistalog module collects log entries from the Windows Event Log. The QueryXML directive collects all of the logs from the system channel of the Windows Event Log. The Exec directive utilizes the to_json(); procedure to format the output in JSON. To take it just a tiny step further, the following configuration does the same as the one above, but it only collects logs with the severity of critical, error, and warning.
nxlog.conf<Extension syslog> (1) Module xm_json </Extension> <Input eventlog> (2) Module im_msvistalog SavePos FALSE ReadFromLast FALSE <QueryXML> <QueryList> <Query Id="0" Path="System"> <Select Path="System">*[System[(Level=1 or Level=2 or Level=3)]]</Select> </Query> </QueryList> </QueryXML> Exec to_json(); </Input>
1 The xm_json module formats the logs in JSON format. The input module calls it with the to_json(); procedure. 2 In this configuration, the im_msvistalog module collects log entries from the Windows Event Log. The QueryXML directive collects the logs with the severity level 1, 2, critical, error, and warning respectively, from the system channel of the Windows Event Log. The Exec directive utilizes the to_json(); procedure to format the output in JSON.
As an interesting fact, the original log size for the first example above came out as 28.417 KB. However, the latter is only 10.096 KB. As I mentioned previously, the amount you can save depends on how many logs you want to keep. Nevertheless, it is an excellent example of how much you can reduce the size of your logs by only simple small tweakings.
Now that you filtered your logs to only those you need and reduced the size by a large amount, it is time to look at individual logs and see if you can trim something off them.
- Trimming logs collected from Windows Event Log
-
As mentioned, here, we will look at the individual logs themselves. In most cases, when collecting logs from Windows Event Log, they contain way more data than you need and even some duplicated data.
For consistency, I selected a particular log type from the same system channel, an error. Imagine you are working as a system administrator at a large corporation and you’re tasked with overseeing the health of a fleet of laptops. The task includes watching for Windows Updates and ensuring all machines are in the best state, including installing the latest and most secure drivers. First of all, you can get all of this information from logs, but there could be a lot of related logs.
In this case, you are looking for events where you want to see whether driver updates are going well. You realize that these logs are logged with EventID 20. You are interested in the computer name and the device whose driver failed to update. You are looking for these and nothing else. To begin with, let’s see what the content of EventID 20 looks like when collected with NXLog Enterprise Edition.
This simple configuration collects log entries with EventID 20 from Windows Event Log using the im_msvistalog module. The only modification the configuration does is formatting the output into JSON so we can examine it quickly and easily compare the size with the following one just by looking at it.
This configuration also uses a filter as we filter out logs with a single Event ID from the rest of the logs. I had about 30 of them out of the 26685 log entries in the system channel. This is also a great way to represent how much it means to collect only the logs you are looking for and has some value to you. Again, of course, depending on your specific needs. nxlog.conf<Extension syslog> (1) Module xm_json </Extension> <Input eventlog> (2) Module im_msvistalog SavePos FALSE ReadFromLast FALSE <QueryXML> <QueryList> <Query Id="0" Path="System"> <Select Path="System">*[System[(Level=2) and (EventID=20)]]</Select> </Query> </QueryList> </QueryXML> Exec to_json(); </Input>
1 The xm_json module formats the logs in JSON format. The input module calls it with the to_json(); procedure. 2 In this configuration, the im_msvistalog module collects log entries from the Windows Event Log. The QueryXML directive collects the logs with EventID 20 from the system channel of the Windows Event Log. The Exec directive utilizes the to_json(); procedure to format the output in JSON. You can see below that the output contains data entries that you have no idea about what they are and are perfect candidates for discarding them. It contains 33 entries.
Output sample in JSON{ "EventTime": "2022-08-03T15:03:33.527090+02:00", "Hostname": "DESKTOP-V65N3UR", "Keywords": "9223372036854775848", "LevelValue": 2, "EventType": "ERROR", "SeverityValue": 4, "Severity": "ERROR", "EventID": 20, "SourceName": "Microsoft-Windows-WindowsUpdateClient", "ProviderGuid": "{945A8954-C147-4ACD-923F-40C45405A658}", "Version": 1, "TaskValue": 1, "OpcodeValue": 13, "RecordNumber": 386, "ExecutionProcessID": 1300, "ExecutionThreadID": 8584, "Channel": "System", "Domain": "NT AUTHORITY", "AccountName": "SYSTEM", "UserID": "S-1-5-18", "AccountType": "Well Known Group", "Message": "Installation Failure: Windows failed to install the following update with error 0x8024200B: Realtek - Extension - 1.0.0.133.", "Category": "Windows Update Agent", "Opcode": "Installation", "Level": "Error", "errorCode": "0x8024200b", "updateTitle": "Realtek - Extension - 1.0.0.133", "updateGuid": "{3e0b7ac2-eff3-487b-99dd-540151d864bc}", "updateRevisionNumber": "1", "serviceGuid": "{8b24b027-1dee-babb-9a95-3517dfb9c552}", "EventReceivedTime": "2023-05-13T11:58:38.867493+02:00", "SourceModuleName": "eventlog", "SourceModuleType": "im_msvistalog" }
The next example below builds on the previous one above. It collects the very same log entries, but using the xm_rewrite module and its Keep directive, it only keeps 7 entries, from the 33 above. Only the data that you probably need.
In this example, I tailor-suited the fields to keep to the imaginary situation I elaborated on above. Yours might differ, so you will likely select fields according to your needs. This simple configuration collects log entries with EventID 20 from Windows Event Log using the im_msvistalog module. It then discards all the unnecessary fields and formats the output as JSON.
nxlog.conf<Extension json> (1) Module xm_json </Extension> <Extension cleanup> (2) Module xm_rewrite Keep EventTime, Hostname, EventID, Severity, SourceName, Message, Domain </Extension> <Input eventlog> (3) Module im_msvistalog SavePos FALSE ReadFromLast FALSE <QueryXML> <QueryList> <Query Id="0" Path="System"> <Select Path="System">*[System[(Level=2) and (EventID=20)]]</Select> </Query> </QueryList> </QueryXML> Exec cleanup->process(); Exec to_json(); </Input>
1 The xm_json module formats the logs in JSON format. The input module calls it with the to_json(); procedure. 2 The xm_rewrite module does the actual filtering of the fields. In this case, it uses the Keep directive to allow what we need and disregard anything else. 3 In this configuration, the im_msvistalog module collects log entries from the Windows Event Log. The QueryXML directive collects the logs with EventID 20 from the system channel of the Windows Event Log. The first Exec directive instructs the xm_rewrite module to process the logs only to keep what we need. Finally, the second Exec directive utilizes the to_json(); procedure to format the output in JSON. This directive invokes the xm_json extension module, which converts the log data into JSON format. The output here needs very little explanation when compared to the previous one. It is visibly smaller and only contains entries that you need. This is not only beneficial from the size point of view, but further processing of the data is much more manageable too.
For the record, the size difference is 368 bytes versus 1275 bytes. Sure, it may seem small in a single log entry, but that is a 4:1 ratio, and when you have hundreds or thousands of workstations and millions of logs, that is a huge difference.
Output sample in JSON{ "EventTime": "2022-08-03T15:03:33.527090+02:00", "Hostname": "DESKTOP-V65N3UR", "Severity": "ERROR", "EventID": 20, "SourceName": "Microsoft-Windows-WindowsUpdateClient", "Domain": "NT AUTHORITY", "Message": "Installation Failure: Windows failed to install the following update with error 0x8024200B: Realtek - Extension - 1.0.0.133." }
- Compressing logs collected from Windows Event Log
-
Compressing logs is one of the best ways to keep size down. I know that most of the hype is about how to get your logs to a SIEM solution. Still, there are scenarios, especially for compliance reasons, when you do not have to have your logs ready to be searched, and it does not have to be in real-time, but you need to keep them for a certain period in storage. For example, in the banking sector, it is five years. Imagine the amount of transaction logs a bank needs to retain in that period. The choice is yours to spend the money on hardware or, with the right tool, keep the size down.
For comparison, I will use the same configuration I used for the first example when we filtered the logs. It was 28.417 KB, and it is the same now. So for this example, I will set NXLog to collect the same logs, but instead of just saving them into a file, I use the xm_zlib module to compress the output on the fly.
If you ever decide to decompress your logs and send them to any analytics platform or SIEM, NXLog Enterprise Edition can also do that. The process is fully reversible, which is rather convenient if you need them in the future. nxlog.conf<Extension gzip> (1) Module xm_zlib Format gzip CompressionLevel 9 CompBufSize 16384 </Extension> <Extension syslog> (2) Module xm_json </Extension> <Input eventlog> (3) Module im_msvistalog SavePos FALSE ReadFromLast FALSE <QueryXML> <QueryList> <Query Id="0" Path="System"> <Select Path="System">*</Select> </Query> </QueryList> </QueryXML> Exec to_json(); </Input> <Output file> (4) Module om_file OutputType gzip.compress File 'C:/logs/windows_events_compressed.gz' </Output>
1 The xm_zlib extension module compresses the data. In this case, it uses the gzip compression algorithm and achieves the highest compression ratio. It utilizes a buffer size of 16384 bytes, the default setting. 2 The xm_json module formats the logs in JSON format. The input module calls it with the to_json(); procedure. 3 The im_msvistalog module collects the log entries from Windows Event Log. The to_json(); procedure invokes the xm_json extension module. 4 The om_file module is responsible for writing the output to a file. The OutputType directive determines the compression method, and the File directive specifies the path and file name. For the record, the compressed data size is 2279 KB, roughly 12 times smaller than the original size, and I have not applied any filtering or trimming. Imagine how much would be the combined saving on storage if you apply filtering and trimming as well.
Again, please consider that the purpose of my calculations is purely to show you that these things are easily doable.
Conclusion
We covered quite a few things in this post, starting from the overview of Windows Event Log and discussing the methods of reducing overall log size and how to do it. Knowing how much the number of logs and their size affects SIEM cost, the first and probably most important thing we can conclude is that optimizing your log collection is worth paying attention to. Let me put it differently: you must pay attention to it! It is a no-brainer. It is a prime opportunity to not only save money but to make life easier as well. Who does not want to do that?
Above all the exciting things I talked (wrote) about, I have elaborated on a couple of examples of how slight tweaking can affect the size and complexity of your log output. I guess this was the most exciting part: envision it through real-life examples of how easy it is to do it.
I must add that the examples here were really simple, and there is a lot more you can do to fine-tune your log collection. Nevertheless, I hope this post provided ideas and made you think of opportunities where you can use these practices in your own environment.