Structured logging - why should you do that?

Structured logging is on the rise. A lot of tools and logging services are finally moving towards structured logging and JSON seems to be the format of choice for this.

But what is structured logging? Traditionally logs were generated in the form of free form text messages prepended with some basic metadata such as the time of the event, severity and the source of the event. This is what the old RFC3164 style Syslog format looks like:

<30>Nov 21 11:40:27 myhost sshd[26459]: Accepted publickey for myhost from 192.168.1.1 port 424242 ssh2

This traditional syslog fromat has a header consisting of the severity, facility, timestamp, hostname and process name followed by a free form text string optionally containing additional metadata. The advantage of this is that log data in this format can be easily parsed and stored in text files for humans to look at.

The Windows Eventlog is slightly different as it is binary data and you cannot simply open .evt or .evtx files with a text editor to read them and that's why you have Event Viewer for. The old Windows Eventlog format (before Vista) only had a couple fields in addition to the message text. If you look at the EVENTLOGRECORD structure you will see a similar set of fields. Fortunately these fields are extended in Windows Vista and later with a couple more, i.e. a set of system properties and any additional fields present in the eventdata section that an application developer wishes to add to an event record. The disadvantage of this proprietary binary format is that it is only possible to read Windows Eventlog data through Microsoft's API and extract event data and metadata in the program. The im_msvistalog module in NXLog does exactly this.

Now back to the topic of structured logging. NXLog has embraced structured logging starting from its inception. See the section Discarding meta-data and structure about some key points why this is important. So event metadata is important. Period.

Our document titled Collecting Windows Security Audit Log data with NXLog and Sysmon has some examples on what the EventLog data generated by Sysmon looks like. Below is the sample Sysmon generated event converted to JSON showing the  basic set of eventlog fields, the Message and addtional Sysmon metadata:

{
 "EventTime":"2015-04-27 15:23:46",
 "Hostname":"WIN-OUNNPISDHIG",
 "Keywords":-9223372036854775808,
 "EventType":"INFO",
 "SeverityValue":2,
 "Severity":"INFO",
 "EventID":1,
 "SourceName":"Microsoft-Windows-Sysmon",
 "ProviderGuid":"{5770385F-C22A-43E0-BF4C-06F5698FFBD9}",
 "Version":3,
 "Task":1,
 "OpcodeValue":0,
 "RecordNumber":2335906,
 "ProcessID":1680,
 "ThreadID":1728,
 "Channel":"Microsoft-Windows-Sysmon/Operational",
 "Domain":"NT AUTHORITY",
 "AccountName":"SYSTEM",
 "UserID":"SYSTEM",
 "AccountType":"Well Known Group",
 "Message":"Process Create:\r\nUtcTime: 2015.04.27. 13:23\r\nProcessGuid: {00000000-3862-553E-0000-001051D40527}\r\nProcessId: 25848\r\nImage: c:\\Program Files (x86)\\nxlog\\nxlog.exe\r\nCommandLine: \"c:\\Program Files (x86)\\nxlog\\nxlog.exe\" -f\r\nUser: WIN-OUNNPISDHIG\\Administrator\r\nLogonGuid: {00000000-568E-5453-0000-0020D5ED0400}\r\nLogonId: 0x4edd5\r\nTerminalSessionId: 2\r\nIntegrityLevel: High\r\nHashType: SHA1\r\nHash: 1DCE4B0F24C40473CE7B2C57EB4F7E9E3E14BF94\r\nParentProcessGuid: {00000000-3862-553E-0000-001088D30527}\r\nParentProcessId: 26544\r\nParentImage: C:\\msys\\1.0\\bin\\sh.exe\r\nParentCommandLine: C:\\msys\\1.0\\bin\\sh.exe",
 "Opcode":"Info",
 "UtcTime":"2015.04.27. 13:23",
 "ProcessGuid":"{00000000-3862-553E-0000-001051D40527}",
 "Image":"c:\\Program Files (x86)\\nxlog\\nxlog.exe",
 "CommandLine":"\"c:\\Program Files (x86)\\nxlog\\nxlog.exe\" -f",
 "User":"WIN-OUNNPISDHIG\\Administrator",
 "LogonGuid":"{00000000-568E-5453-0000-0020D5ED0400}",
 "LogonId":"0x4edd5",
 "TerminalSessionId":"2",
 "IntegrityLevel":"High",
 "HashType":"SHA1",
 "Hash":"1DCE4B0F24C40473CE7B2C57EB4F7E9E3E14BF94",
 "ParentProcessGuid":"{00000000-3862-553E-0000-001088D30527}",
 "ParentProcessId":"26544",
 "ParentImage":"C:\\msys\\1.0\\bin\\sh.exe",
 "ParentCommandLine":"C:\\msys\\1.0\\bin\\sh.exe",
 "EventReceivedTime":"2015-04-27 15:23:47",
 "SourceModuleName":"in",
 "SourceModuleType":"im_msvistalog"
}

Now if you look closer you will notice that Message basically contains all the additional set of fields dumped into the string. There are still a lot of eventlog collector tools out there such as Snare, eventlog-to-syslog, etc that simply convert this data and format the output as a string by taking the basic set of fields and create a syslog message by adding a header using these (timestamp, event id, hostname, user id and message). The rest of the fields are simply discarded! While it is true that the Message contains most of this (if you are lucky). The problem with this is that this data then will need to be parsed and the fields have to be extracted with regular expressions.

Josh Brower published a paper very recently titled Using Sysmon To Enrich Security Onion’s Host-Level Capabilities. The paper and the concepts outlined there are a good read for the security minded. Unfortunately it goes on to parse received eventlog data using regular expressions, a practice that is not recommeded now that the world has finally started to realize that structured logging is great.  Go read Thoughts on Logging - Part 1 - Structured Logging by Gregory Szorc to understand more what all this rambling is about.

To explain in more depth what's wrong with Josh's approach, let's look at the patterns in Appendix C and Appendix E of the paper. Here are two examples for the syslog-ng patterndb and the OSSEC extractor:

<pattern>@NUMBER::@@ESTRING::(@@ESTRING::)@ @IPv4::@->WinEvtLog
@ESTRING::(@@NUMBER::):@): @ESTRING:::@@ESTRING:::@@ESTRING:::@
@ESTRING:s0::@@ESTRING::{@@ESTRING:s1:}@@ESTRING::Image: @@ESTRING:s2:
CommandLine: @@ESTRING::User: @@ESTRING:s3: LogonGuid:@@ESTRING::Hash:
@@ESTRING:s4: @@ESTRING::ParentImage: @@ESTRING:s5: ParentCommandLine:@</pattern>
<regex offset="after_prematch">Image: (\.*) \s*CommandLine: \.* \s*User: (\.*) \s*LogonGuid: \S*
\s*LogonId: \S* \s*TerminalSessionId: \S* \s*IntegrityLevel: \S* \s*HashType: \S* \s*Hash: (\S*)
\s*ParentProcessGuid: \S* \s*ParentProcessID: \S* \s*ParentImage: (\.*) \s*ParentCommandLine:</regex>

These two basically do the same thing and parse the Message with two different tools twice. This is an ugly system design  on its own IMHO but let's not worry about this. Now imagine if the next version of Sysmon came out with an added field for this event, e.g. ParentProcessUserId. This would most likely be also included in Message causing the SIEM to stop working properly with these events. The SIEM admin will have to first troubleshoot the issue and then fix the regular expression once he found out that the issue is caused by the updated Sysmon producing logs that the patterns no longer match. And it's actually two patterns that the admin will need to fix: the regexp for OSSEC and the syslog-ng pattern used by ELSA.

If these tools included in Security Onion were using structured logging there would be no need to write such parser rules in the first place and a changed Message would not break the functionality after an upgrade. That's a limitation of the tools included with Security Onion and the poor design decisions taken so Josh had no choice but to cook with these. Again the generic concepts in his paper are a great read.

Share this post