How to write a Regular expressions for Traditional Chinese characters


#1 kevinlin

Hi,

I am trying to collect Windows DNS debug logs with Nxlog xm_multiline. I reference below link: Parsing Detailed DNS Logs With Regular Expressions (https://nxlog.co/documentation/nxlog-user-guide/windows-dns-server.html#parsing-detailed)

But, Windows DNS Debug Logs includes Traditional Chinese characters, it won't let me combine multiline into one log, What is correct "HEADER_REGEX" that should I use?

DNS Debug Logs sample is (I beleive problem is 上午, By the way, 上午=AM and 下午=PM): 2020/3/6 上午 11:58:01 0E80 PACKET 000001D80FE9BD40 UDP Snd 10.0.35.101 a3f5 R Q [8081 DR NOERROR] A (5)e3998(1)d(10)akamaiedge(3)net(0) UDP response info at 000001D80FE9BD40 Socket = 724 Remote addr 10.0.35.101, port 56423 Time Query=283057, Queued=283057, Expire=283060 Buf length = 0x0200 (512) Msg length = 0x0038 (56) Message: XID 0xa3f5 Flags 0x8180 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 1 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 1 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(5)e3998(1)d(10)akamaiedge(3)net(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: Offset = 0x0028, RR count = 0 Name "C00Ce3998(1)d(10)akamaiedge(3)net(0)" TYPE A (1) CLASS 1 TTL 20 DLEN 4 DATA 96.7.252.200 AUTHORITY SECTION: empty ADDITIONAL SECTION: empty

Nxlog configuration sample is: define ROOT C:\Program Files (x86)\nxlog

Moduledir %ROOT%\modules CacheDir %ROOT%\data Pidfile %ROOT%\data\nxlog.pid SpoolDir %ROOT%\data LogFile %ROOT%\data\nxlog.log

<Extension _charconv> Module xm_charconv AutodetectCharsets BIG-5, utf-8, utf-16, utf-32, iso8859-2 </Extension>

<Extension gelf> Module xm_gelf </Extension>

define EVENT_REGEX /(?x)(?<Date>\d+(?:/\d+){2})\s
(?<Time>\d+(?::\d+){2})\s
(?<ThreadId>\w+)\s+
(?<Context>\w+)\s+
(?<InternalPacketIdentifier>[[:xdigit:]]+)\s+
(?<Protocol>\w+)\s+
(?<SendReceiveIndicator>\w+)\s
(?<RemoteIP>[[:xdigit:].:]+)\s+
(?<Xid>[[:xdigit:]]+)\s
(?<QueryType>\s|R)\s
(?<Opcode>[A-Z]|?)\s
(?<QFlags>[(.?)])\s+
(?<QuestionType>\w+)\s+
(?<QuestionName>.
)\s+
(?<LogInfo>.+)\s+.+=\s
(?<Socket>\d+)\s+ Remote\s+ addr\s
(?<RemoteAddr>.+),\sport\s
(?<PortNum>\d+)\s+Time\sQuery=
(?<TimeQuery>\d+),\sQueued=
(?<Queued>\d+),\sExpire=
(?<Expire>\d+)\s+.+(
(?<BufLen>\d+))\s+.+(
(?<MsgLen>\d+))\s+Message:\s+
(?<Message>(?s).*)/

define HEADER_REGEX /(?x)(?<Date>\d+(?:/\d+){2})\s
(?<AMPM>\x{e4}\x{b8}\x{8a}\x{e5}\x{8d}\x{88})\s
(?<Time>\d+(?::\d+){2})\s
(?<ThreadId>\w+)\s+
(?<Context>\w+)\s+
(?<InternalPacketIdentifier>[[:xdigit:]]+)\s+
(?<Protocol>\w+)\s+
(?<SendReceiveIndicator>\w+)\s
(?<RemoteIP>[[:xdigit:].:]+)\s+
(?<Xid>[[:xdigit:]]+)\s
(?<QueryType>\s|R)\s
(?<Opcode>[A-Z]|?)\s
(?<QFlags>[(.?)])\s+
(?<QuestionType>\w+)\s+
(?<QuestionName>.
)/

<Extension multiline> Module xm_multiline HeaderLine %HEADER_REGEX% </Extension>

<Input windnsdetaillog> Module im_file File 'C:\dns.log' Exec convert_fields("BIG-5", "utf-8"); InputType multiline Exec if $raw_event =~ /(\d+)/(\d+/\d+)\s(上午\s)(\d+:\d+:\d+\s)((.|\n))/ $raw_event = $2 + '/' + $1 + ' ' + $4 + 'AM ' + $5; Exec if $raw_event =~ /(\d+)/(\d+/\d+)\s(下午\s)(\d+:\d+:\d+\s)((.|\n))/ $raw_event = $2 + '/' + $1 + ' ' + $4 + 'PM ' + $5; <Exec> if $raw_event =~ %EVENT_REGEX% { $EventTime = parsedate($Date + " " + $Time + " " + $AMPM); delete($Date); delete($Time); } </Exec> </Input>

<Input wineventin> Module im_msvistalog </Input>

<Output windnsdetaillogout> Module om_tcp Host 192.168.11.3 Port 12198 OutputType GELF_TCP </Output>

<Output wineventout> Module om_udp Host 192.168.11.3 Port 12196 OutputType GELF </Output>

<Route 1> Path wineventin => wineventout </Route>

<Route 2> Path windnsdetaillog => windnsdetaillogout </Route>

#2 MisazivDeactivated Nxlog ✓
#1 kevinlin
Hi, I am trying to collect Windows DNS debug logs with Nxlog xm_multiline. I reference below link: Parsing Detailed DNS Logs With Regular Expressions (https://nxlog.co/documentation/nxlog-user-guide/windows-dns-server.html#parsing-detailed) But, Windows DNS Debug Logs includes Traditional Chinese characters, it won't let me combine multiline into one log, What is correct "HEADER_REGEX" that should I use? DNS Debug Logs sample is (I beleive problem is 上午, By the way, 上午=AM and 下午=PM): 2020/3/6 上午 11:58:01 0E80 PACKET 000001D80FE9BD40 UDP Snd 10.0.35.101 a3f5 R Q [8081 DR NOERROR] A (5)e3998(1)d(10)akamaiedge(3)net(0) UDP response info at 000001D80FE9BD40 Socket = 724 Remote addr 10.0.35.101, port 56423 Time Query=283057, Queued=283057, Expire=283060 Buf length = 0x0200 (512) Msg length = 0x0038 (56) Message: XID 0xa3f5 Flags 0x8180 QR 1 (RESPONSE) OPCODE 0 (QUERY) AA 0 TC 0 RD 1 RA 1 Z 0 CD 0 AD 0 RCODE 0 (NOERROR) QCOUNT 1 ACOUNT 1 NSCOUNT 0 ARCOUNT 0 QUESTION SECTION: Offset = 0x000c, RR count = 0 Name "(5)e3998(1)d(10)akamaiedge(3)net(0)" QTYPE A (1) QCLASS 1 ANSWER SECTION: Offset = 0x0028, RR count = 0 Name "C00Ce3998(1)d(10)akamaiedge(3)net(0)" TYPE A (1) CLASS 1 TTL 20 DLEN 4 DATA 96.7.252.200 AUTHORITY SECTION: empty ADDITIONAL SECTION: empty Nxlog configuration sample is: define ROOT C:\Program Files (x86)\nxlog Moduledir %ROOT%\modules CacheDir %ROOT%\data Pidfile %ROOT%\data\nxlog.pid SpoolDir %ROOT%\data LogFile %ROOT%\data\nxlog.log <Extension _charconv> Module xm_charconv AutodetectCharsets BIG-5, utf-8, utf-16, utf-32, iso8859-2 </Extension> <Extension gelf> Module xm_gelf </Extension> define EVENT_REGEX /(?x)(?<Date>\d+(?:/\d+){2})\s (?<Time>\d+(?::\d+){2})\s (?<ThreadId>\w+)\s+ (?<Context>\w+)\s+ (?<InternalPacketIdentifier>[[:xdigit:]]+)\s+ (?<Protocol>\w+)\s+ (?<SendReceiveIndicator>\w+)\s (?<RemoteIP>[[:xdigit:].:]+)\s+ (?<Xid>[[:xdigit:]]+)\s (?<QueryType>\s|R)\s (?<Opcode>[A-Z]|?)\s (?<QFlags>[(.?)])\s+ (?<QuestionType>\w+)\s+ (?<QuestionName>.)\s+ (?<LogInfo>.+)\s+.+=\s (?<Socket>\d+)\s+ Remote\s+ addr\s (?<RemoteAddr>.+),\sport\s (?<PortNum>\d+)\s+Time\sQuery= (?<TimeQuery>\d+),\sQueued= (?<Queued>\d+),\sExpire= (?<Expire>\d+)\s+.+( (?<BufLen>\d+))\s+.+( (?<MsgLen>\d+))\s+Message:\s+ (?<Message>(?s).*)/ define HEADER_REGEX /(?x)(?<Date>\d+(?:/\d+){2})\s (?<AMPM>\x{e4}\x{b8}\x{8a}\x{e5}\x{8d}\x{88})\s (?<Time>\d+(?::\d+){2})\s (?<ThreadId>\w+)\s+ (?<Context>\w+)\s+ (?<InternalPacketIdentifier>[[:xdigit:]]+)\s+ (?<Protocol>\w+)\s+ (?<SendReceiveIndicator>\w+)\s (?<RemoteIP>[[:xdigit:].:]+)\s+ (?<Xid>[[:xdigit:]]+)\s (?<QueryType>\s|R)\s (?<Opcode>[A-Z]|?)\s (?<QFlags>[(.?)])\s+ (?<QuestionType>\w+)\s+ (?<QuestionName>.)/ <Extension multiline> Module xm_multiline HeaderLine %HEADER_REGEX% </Extension> <Input windnsdetaillog> Module im_file File 'C:\dns.log' Exec convert_fields("BIG-5", "utf-8"); InputType multiline Exec if $raw_event =~ /(\d+)/(\d+/\d+)\s(上午\s)(\d+:\d+:\d+\s)((.|\n))/ $raw_event = $2 + '/' + $1 + ' ' + $4 + 'AM ' + $5; Exec if $raw_event =~ /(\d+)/(\d+/\d+)\s(下午\s)(\d+:\d+:\d+\s)((.|\n))/ $raw_event = $2 + '/' + $1 + ' ' + $4 + 'PM ' + $5; <Exec> if $raw_event =~ %EVENT_REGEX% { $EventTime = parsedate($Date + " " + $Time + " " + $AMPM); delete($Date); delete($Time); } </Exec> </Input> <Input wineventin> Module im_msvistalog </Input> <Output windnsdetaillogout> Module om_tcp Host 192.168.11.3 Port 12198 OutputType GELF_TCP </Output> <Output wineventout> Module om_udp Host 192.168.11.3 Port 12196 OutputType GELF </Output> <Route 1> Path wineventin => wineventout </Route> <Route 2> Path windnsdetaillog => windnsdetaillogout </Route>

Hi,

You can try and add GB2312 and GBK to the AutodetectCharsets BIG-5, utf-8, utf-16, utf-32, iso8859-2 line.

MisaZ