Parsing delimited log files with regex

Tags:

#1 stephen

Hi I'm using nxlog v2.9.1716.

I've created the following input:

<Input in> Module im_file File "C:\Program Files\LogFiles\*.log" SavePos TRUE Recursive TRUE

Exec if $raw_event =~ /^#/ drop();
Exec if $raw_event =~ ^([^;]+);([^;]+);([^;]+)(?&gt;;([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);(.+)$)?/gx; \
	{ \
		$date = $1; \
		$time = $2; \
		$site-instance = $3; \
		$event = $4; \
		$client-ip = $5; \
		$via-header = $6; \
		$http-x-forwarded-for = $7; \
		$host-header = $8; \
		$additional-info-1 = $9; \
		$additional-info-2 = $10; \
		$additional-info-3 = $11; \
		$additional-info-4 = $12; \
		$additional-info = $13; \
		$EventTime = parsedate($date + &quot; &quot; + $time); \
		$SourceName = &quot;WAF&quot;; \
	}								

</Input>

The regex being used has been successfully tested with https://regex101.com/

Sample data below:

2018-06-28 ; 10:23:52 ; W3SVC2 ; OnPreprocHeaders ; 10.10.10.10 ; ; 8.8.8.8 ; my.domain.com ; GET ; /account/login ; ALERT: '/account/' not allowed in URL ; HTTP/1.0 ; 0 ; ; Actional Intermediary

When I start the nxlog service, I get the following error:

2018-06-28 16:44:51 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:89; couldn't parse statement at line 89, character 24 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 16:44:51 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 WARNING not starting unused module in 2018-06-28 16:44:51 INFO nxlog-ce-2.9.1716 started 2018-06-28 16:44:51 INFO reconnecting in 1 seconds

I also tried the following:

<Input in> Module im_file File "C:\Program Files\AQTRONIX Webknight\LogFiles\*.log" SavePos TRUE Recursive TRUE <Exec> if $Message =~ /^#/ drop(); $Message =~ ^(?<date>[^;]+);(?<time>[^;]+);(?<site_instance>[^;]+)(?>;(?<event>[^;]+);(?<client_ip>[^;]+);(?<via_header>[^;]+);(?<http_x_forwarded_for>[^;]+);(?<host_header>[^;]+);(?<additional_info_1>[^;]+);(?<additional_info_2>[^;]+);(?<additional_info_3>[^;]+);(?<additional_info_4>[^;]+);(?<additional_info>.+)$)? /gx; </Exec> </Input>

But I receive the following error on starting nxlog:

2018-06-28 17:15:54 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:70; couldn't parse statement at line 72, character 15 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 17:15:54 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 WARNING not starting unused module in 2018-06-28 17:15:54 INFO nxlog-ce-2.9.1716 started

I tried various syntax changes, but just cannot see the issue.

This is the first time I've tried using a regex with nxlog.

Any help or guidance much appreciated.

#2 Zhengshi Nxlog ✓
#1 stephen
Hi I'm using nxlog v2.9.1716. I've created the following input: <Input in> Module im_file File "C:\Program Files\LogFiles\*.log" SavePos TRUE Recursive TRUE Exec if $raw_event =~ /^#/ drop(); Exec if $raw_event =~ ^([^;]+);([^;]+);([^;]+)(?&gt;;([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);([^;]+);(.+)$)?/gx; \ { \ $date = $1; \ $time = $2; \ $site-instance = $3; \ $event = $4; \ $client-ip = $5; \ $via-header = $6; \ $http-x-forwarded-for = $7; \ $host-header = $8; \ $additional-info-1 = $9; \ $additional-info-2 = $10; \ $additional-info-3 = $11; \ $additional-info-4 = $12; \ $additional-info = $13; \ $EventTime = parsedate($date + &quot; &quot; + $time); \ $SourceName = &quot;WAF&quot;; \ } </Input> The regex being used has been successfully tested with https://regex101.com/ Sample data below: 2018-06-28 ; 10:23:52 ; W3SVC2 ; OnPreprocHeaders ; 10.10.10.10 ; ; 8.8.8.8 ; my.domain.com ; GET ; /account/login ; ALERT: '/account/' not allowed in URL ; HTTP/1.0 ; 0 ; ; Actional Intermediary When I start the nxlog service, I get the following error: 2018-06-28 16:44:51 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:89; couldn't parse statement at line 89, character 24 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 16:44:51 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:116 2018-06-28 16:44:51 WARNING not starting unused module in 2018-06-28 16:44:51 INFO nxlog-ce-2.9.1716 started 2018-06-28 16:44:51 INFO reconnecting in 1 seconds I also tried the following: <Input in> Module im_file File "C:\Program Files\AQTRONIX Webknight\LogFiles\*.log" SavePos TRUE Recursive TRUE <Exec> if $Message =~ /^#/ drop(); $Message =~ ^(?<date>[^;]+);(?<time>[^;]+);(?<site_instance>[^;]+)(?>;(?<event>[^;]+);(?<client_ip>[^;]+);(?<via_header>[^;]+);(?<http_x_forwarded_for>[^;]+);(?<host_header>[^;]+);(?<additional_info_1>[^;]+);(?<additional_info_2>[^;]+);(?<additional_info_3>[^;]+);(?<additional_info_4>[^;]+);(?<additional_info>.+)$)? /gx; </Exec> </Input> But I receive the following error on starting nxlog: 2018-06-28 17:15:54 ERROR Couldn't parse Exec block at C:\Program Files (x86)\nxlog\conf\nxlog.conf:70; couldn't parse statement at line 72, character 15 in C:\Program Files (x86)\nxlog\conf\nxlog.conf; syntax error 2018-06-28 17:15:54 ERROR module 'in' has configuration errors, not adding to route '2' at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 ERROR route 2 is not functional without input modules, ignored at C:\Program Files (x86)\nxlog\conf\nxlog.conf:100 2018-06-28 17:15:54 WARNING not starting unused module in 2018-06-28 17:15:54 INFO nxlog-ce-2.9.1716 started I tried various syntax changes, but just cannot see the issue. This is the first time I've tried using a regex with nxlog. Any help or guidance much appreciated.

As per the user guide on fields:

89.2.3. Fields
Fields are referenced in the NXLog language by prepending a dollar sign ($) to the field name. A field name can contain the characters [a-zA-Z0-9_.] but must begin with a letter or underscore (), as indicated by the following regular expression:
[[:alpha:]
][[:alnum:]._]*
Fields containing special characters such as the space or minus (-) can be specified using curly braces such as ${file-size} or ${file size}.

For instance: $site-instance = $3; needs to either be ${site-instance} = $3; or $site_instance = $3;

Using <Exec> blocks can help more easily identify the position and line of errors.

Your regex also needs to start with /. Example : if $raw_event =~ /^([^;]

I believe your last issue is on $raw_event you are only getting one event at a time and matching the entire event so /gx shouldn't be needed.