Parsing delimited log files with regex

Tags:

#1 stephen

Hi I'm using nxlog v2.9.1716.

I've created the following input:

<Input in> Module im_file File "C:\Program Files\LogFiles\*.log" SavePos TRUE Recursive TRUE

Exec if $raw_event =~ /^#/ drop();
Exec if $raw_event =~ ^([^;]+);([^;]+);([^;]+)(?&gt;;([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);(.+)$)?/gx; \
	{ \
		$date = $1; \
		$time = $2; \
		$site-instance = $3; \
		$event = $4; \
		$client-ip = $5; \
		$via-header = $6; \
		$http-x-forwarded-for = $7; \
		$host-header = $8; \
		$additional-info-1 = $9; \
		$additional-info-2 = $10; \
		$additional-info-3 = $11; \
		$additional-info-4 = $12; \
		$additional-info = $13; \
		$EventTime = parsedate($date + &quot; &quot; + $time); \
		$SourceName = &quot;WAF&quot;; \
	}								

</Input>

The regex being used has been successfully tested with https://regex101.com/

Sample data below:

2018-06-28 ; 10:23:52 ; W3SVC2 ; OnPreprocHeaders ; 10.10.10.10 ; ; 8.8.8.8 ; my.domain.com ; GET ; /account/login ; ALERT: '/account/' not allowed in URL ; HTTP/1.0 ; 0 ; ; Actional Intermediary

When I start the nxlog service, I get the following error:

2018-06-28 16:44:51 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:89; couldn't parse statement at line 89, character 24 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 16:44:51 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 WARNING not starting unused module in 2018-06-28 16:44:51 INFO nxlog-ce-2.9.1716 started 2018-06-28 16:44:51 INFO reconnecting in 1 seconds

I also tried the following:

<Input in> Module im_file File "C:\Program Files\AQTRONIX Webknight\LogFiles\*.log" SavePos TRUE Recursive TRUE <Exec> if $Message =~ /^#/ drop(); $Message =~ ^(?<date>[^;]+);(?<time>[^;]+);(?<site_instance>[^;]+)(?>;(?<event>[^;]+);(?<client_ip>[^;]+);(?<via_header>[^;]+);(?<http_x_forwarded_for>[^;]+);(?<host_header>[^;]+);(?<additional_info_1>[^;]+);(?<additional_info_2>[^;]+);(?<additional_info_3>[^;]+);(?<additional_info_4>[^;]+);(?<additional_info>.+)$)? /gx; </Exec> </Input>

But I receive the following error on starting nxlog:

2018-06-28 17:15:54 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:70; couldn't parse statement at line 72, character 15 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 17:15:54 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 WARNING not starting unused module in 2018-06-28 17:15:54 INFO nxlog-ce-2.9.1716 started

I tried various syntax changes, but just cannot see the issue.

This is the first time I've tried using a regex with nxlog.

Any help or guidance much appreciated.

#2 Zhengshi Nxlog ✓
#1 stephen

Hi I'm using nxlog v2.9.1716.

I've created the following input:

<Input in> Module im_file File "C:\Program Files\LogFiles\*.log" SavePos TRUE Recursive TRUE

Exec if $raw_event =~ /^#/ drop();
Exec if $raw_event =~ ^([^;]+);([^;]+);([^;]+)(?&gt;;([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);(.+)$)?/gx; \
	{ \
		$date = $1; \
		$time = $2; \
		$site-instance = $3; \
		$event = $4; \
		$client-ip = $5; \
		$via-header = $6; \
		$http-x-forwarded-for = $7; \
		$host-header = $8; \
		$additional-info-1 = $9; \
		$additional-info-2 = $10; \
		$additional-info-3 = $11; \
		$additional-info-4 = $12; \
		$additional-info = $13; \
		$EventTime = parsedate($date + &quot; &quot; + $time); \
		$SourceName = &quot;WAF&quot;; \
	}								

</Input>

The regex being used has been successfully tested with https://regex101.com/

Sample data below:

2018-06-28 ; 10:23:52 ; W3SVC2 ; OnPreprocHeaders ; 10.10.10.10 ; ; 8.8.8.8 ; my.domain.com ; GET ; /account/login ; ALERT: '/account/' not allowed in URL ; HTTP/1.0 ; 0 ; ; Actional Intermediary

When I start the nxlog service, I get the following error:

2018-06-28 16:44:51 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:89; couldn't parse statement at line 89, character 24 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 16:44:51 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 WARNING not starting unused module in 2018-06-28 16:44:51 INFO nxlog-ce-2.9.1716 started 2018-06-28 16:44:51 INFO reconnecting in 1 seconds

I also tried the following:

<Input in> Module im_file File "C:\Program Files\AQTRONIX Webknight\LogFiles\*.log" SavePos TRUE Recursive TRUE <Exec> if $Message =~ /^#/ drop(); $Message =~ ^(?<date>[^;]+);(?<time>[^;]+);(?<site_instance>[^;]+)(?>;(?<event>[^;]+);(?<client_ip>[^;]+);(?<via_header>[^;]+);(?<http_x_forwarded_for>[^;]+);(?<host_header>[^;]+);(?<additional_info_1>[^;]+);(?<additional_info_2>[^;]+);(?<additional_info_3>[^;]+);(?<additional_info_4>[^;]+);(?<additional_info>.+)$)? /gx; </Exec> </Input>

But I receive the following error on starting nxlog:

2018-06-28 17:15:54 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:70; couldn't parse statement at line 72, character 15 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 17:15:54 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 WARNING not starting unused module in 2018-06-28 17:15:54 INFO nxlog-ce-2.9.1716 started

I tried various syntax changes, but just cannot see the issue.

This is the first time I've tried using a regex with nxlog.

Any help or guidance much appreciated.

As per the user guide on [fields](https://nxlog.co/documentation/nxlog-user-guide#lang_fields): >89.2.3. Fields >Fields are referenced in the NXLog language by prepending a dollar sign ($) to the field name. A field name can contain the characters [a-zA-Z0-9_.] but must begin with a letter or underscore (_), as indicated by the following regular expression: >[[:alpha:]_][[:alnum:]\._]* >Fields containing special characters such as the space or minus (-) can be specified using curly braces such as ${file-size} or ${file size}. For instance: `$site-instance = $3;` needs to either be `${site-instance} = $3;` or `$site_instance = $3;` Using `` blocks can help more easily identify the position and line of errors. Your regex also needs to start with `/`. Example : `if $raw_event =~ /^([^;]` I believe your last issue is on `$raw_event` you are only getting one event at a time and matching the entire event so `/gx` shouldn't be needed.