1
answer

Dear All!

We are trying to read the listener log of an Oracle database with Nxlog.

Sample:
<msg time='2018-12-01T23:50:08.409+01:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='server11'
host_addr='10.10.10.10'>
<txt>01-DEC-2018 23:50:08 * (CONNECT_DATA=(SID=oracle)(CID=(PROGRAM=JDBC Thin Client)(HOST=__jdbc__)(USER=user1$))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.10.10)(PORT=54468)) * establish * db1 * 0
</txt>
</msg>

It is clear that this is a multiline message which can be read by xm-multiline, then with xm_xml it is possible to convert it to XML entries.

After the 'txt' part is available for processing I would like to capture the following info from it:
DATE=01-DEC-2018 23:50:08 (this is simple, just cut the first 20 characters.
Action=establish (this is simple, splitting the text by the '*')
Target=db1 (this is simple, splitting the text by the '*')
Result=0 (this is simple, splitting the text by the '*')
SID=oracle
PROGRAM=JDBC Thin Client
USER=user1$
HOST=10.10.10.10
PORT=54468

In my opinion the SID/PROGRAM/USER/HOST/PORT cannot be captured by the xm_kvp module as it is not separated by the same symbol.
What would you recommend to collect the proper info from that part of the txt?

I have not found any solution for reading the Oracle listener log - that's why it would be a big help to get your opinions!

Thank you!

AskedDecember 5, 2018 - 11:20am

Answer (1)

If all the records are very similar (fields are always in the same order and only the data changes that you want to extract) then a simpler solution would be to use a plain regexp instead of trying to parse it as XML or KVP.

AnsweredDecember 5, 2018 - 11:29am

Comments (5)

  • Pdudas's picture

    Unfortunately the messages are not always the same - not even the same number of fields included in one XML structure.

    I already created some exceptions to drop the unwanted events as the main purpose of the parsing is to get info about external connections to the given database.
    So 'service_update',internal connections (by SYSTEM) are dropped (and recorded only to a daily logfile which is rotated):

    <Exec>
    if $raw_event !~ /^<msg/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /service_update/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /USER=OPERADG1/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /USER=SYSTEM/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    parse_xml();
    # kvp->parse_kvp($txt);
    delete($EventReceivedTime);
    delete($SourceModuleName);
    delete($SourceModuleType);
    delete($level);
    delete($FileName);
    delete($Message);
    to_json();
    file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_message.log", $txt);
    </Exec>

    I'll try to parse it with regexp - maybe that works.
    If I'll be successful i'll share the result here - maybe this will be useful for others.

    December 5, 2018 - 12:08pm
  • Pdudas's picture

    I'm stucked with regex.

    Input string:
    01-DEC-2018 23:55:11 * (CONNECT_DATA=(SERVICE_NAME=opera)(CID=(PROGRAM=c:\python27\python.exe)(HOST=OXI2)(USER=Peter?Dudas))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.148.66.34)(PORT=64040)) * establish * opera * 0

    In an EXEC directive:

    if $txt =~ /^\d{2}-[a-zA-Z]{3}-\d{4}\s\d{2}:\d{2}:\d{2}/ {$EventTime = parsedate($1);}
    if $txt =~ /(?<=(HOST=)).*?(?=\))/ {$HOST = $1;}
    if $txt =~ /(?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])/ {$HOSTADDRESS = $1;}
    if $txt =~ /(?<=(USER=)).*?(?=\))/ {$USER = $1;}
    if $txt =~ /(?<=(SERVICE_NAME=)).*?(?=\))/ {$SERVICE_NAME = $1;}
    if $txt =~ /(?<=(PORT=)).*?(?=\))/ {$PORT = $1;}
    if $txt =~ /(?<=(PROTOCOL=)).*?(?=\))/ {$PROTOCOL = $1;}
    if $txt =~ /(?<=(PROGRAM=)).*?(?=\))/ {$PROGRAM = $1;}

    They are capturing the labels (HOST=/HOSTADDRESS=/USER=) instead of the content matched by the regex (OXI2/10.148.66.34/Peter?Dudas).
    Verified on the regex101.com for PCRE - there it works fine.
    Did I make a syntax error or why I cannot capture the desired content?

    Thank you!

    December 5, 2018 - 2:02pm
  • Zhengshi's picture
    (NXLog)

    I think something more like (?<=USER=)(.*?)(?=\)) because your capture group is over the look behind.
    I bet your regex101 showed the user1$ as full match and USER= as capture group.

    December 5, 2018 - 3:25pm
  • Pdudas's picture

    Thank you!

    I was not aware of Nxlog capturing the 1st group with that regexp (and not the full match).

    Working config for Oracle Listener Log Alert (sends logs to graylog in gelf):
    _____________________________________________________________________________________________________________________________________
    define ROOT C:\Program Files (x86)\nxlog

    <Extension gelf>
    Module xm_gelf
    </Extension>

    <Extension 5c0661cde8ad8803c7f36346-multiline>
    Module xm_multiline
    HeaderLine /^<msg/
    EndLine /^</msg>/
    </Extension>

    Moduledir %ROOT%\modules
    CacheDir %ROOT%\data
    Pidfile %ROOT%\data\nxlog.pid
    SpoolDir %ROOT%\data
    LogFile %ROOT%\data\nxlog.log
    LogLevel INFO

    <Extension logrotate>
    Module xm_fileop
    <Schedule>
    When @daily
    Exec file_cycle('%ROOT%\data\nxlog.log', 7);
    Exec file_cycle('%ROOT%\data\oracle_drop.log', 7);
    </Schedule>
    </Extension>

    <Extension xmlparser>
    Module xm_xml
    </Extension>

    <Extension json>
    Module xm_json
    </Extension>

    <Extension charconv>
    Module xm_charconv
    </Extension>

    <Input 5c0661cde8ad8803c7f36346>
    Module im_file
    File 'e:\oracle\log.xml'
    PollInterval 1
    SavePos True
    ReadFromLast True
    Recursive True
    RenameCheck True
    Exec $FileName = file_name(); # Send file name with each message
    InputType 5c0661cde8ad8803c7f36346-multiline
    <Exec>
    if $raw_event !~ /^<msg/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /service_update/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /USER=OPERADG1/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    if $raw_event =~ /USER=SYSTEM/ {file_write("C:\\Program Files (x86)\\nxlog\data\\oracle_drop.log", $raw_event); drop(); }
    parse_xml();

    if $txt =~ /^(\d{2}-[a-zA-Z]{3}-\d{4}\s\d{2}:\d{2}:\d{2})/ $DATE = $1;
    if $txt =~ /(?<=SID=)(.*?)(?=\))/ $SID = $1;
    if $txt =~ /(?<=HOST=)(.*?)(?=\))/ $HOST = $1;
    if $txt =~ /(?<=USER=)(.*?)(?=\))/ $USER = $1;
    if $txt =~ /(?<=PORT=)(.*?)(?=\))/ $PORT = $1;
    if $txt =~ /(?<=SERVICE_NAME=)(.*?)(?=\))/ $SERVICE_NAME = $1;
    if $txt =~ /(?<=PROTOCOL=)(.*?)(?=\))/ $PROTOCOL = $1;
    if $txt =~ /(?<=PROGRAM=)(.*?)(?=\))/ $PROGRAM = $1;
    if $txt =~ /((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]))/ $HOSTADDRESS = $1;

    if $USER !='' $USER = convert($USER, "WIN-1252", "utf-8");
    delete($EventReceivedTime);
    delete($SourceModuleName);
    delete($SourceModuleType);
    delete($level);
    delete($FileName);
    delete($Message);
    to_json();
    </Exec>
    </Input>

    <Output 5c066159e8ad8803c7f362c7>
    Module om_tcp
    Host 10.x.y.z
    Port 12205
    OutputType GELF_TCP
    Exec $short_message = $raw_event; # Avoids truncation of the short_message field.
    Exec $gl2_source_collector = 'hostname of source';
    Exec $Hostname = hostname_fqdn();
    </Output>

    <Route route-0>
    Path 5c0661cde8ad8803c7f36346 => 5c066159e8ad8803c7f362c7
    </Route>
    ___________________________________________________________________________________________________________

    Sample content for the txt part read from the log.xml:
    02-DEC-2018 09:35:48 * (CONNECT_DATA=(SID=OPERA)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.148.1.1)(PORT=61699)) * establish * OPERA * 0

    December 6, 2018 - 11:26am
  • Zhengshi's picture
    (NXLog)

    I was not aware of Nxlog capturing the 1st group with that regexp (and not the full match).

    This is actually how the positive lookbehinds work in PCRE, not just NXLog.
    The standard format is (?<=regex) and when you put in an extra set of parenthesis, it adds a capture group. Since you were using subgroup $1, it chose the first capture group which would have been something like HOST=. You could use full match $0 instead of $1 if you would like with removing the capture group, like so : (?<=HOST=).*?(?=\)).

    See these links for additional details.
    https://nxlog.co/documentation/nxlog-user-guide-full#lang_regexp
    https://www.regular-expressions.info/refadv.html

    December 6, 2018 - 11:25pm