如何在NXLog中使用RegEx对多行XML进行模式化

时间:2015-09-14 20:59:39

标签: regex xml elasticsearch multiline nxlog

我正在尝试使用nxLog解析器to_json()将自定义日志文件解析为JSON,以便我可以将它们发送到我的ElasticSearch实例中。我将把它们分成三个独立的字段,日期,日志类型指示符和消息。

以下是这些日志的格式。

9/10/2015 11:30:05 AM [0-1-1-Pos.xaml.cs-1607] Post button clicked

9/10/2015 11:30:17 AM [0-3-1-SecondaryPortStatus.cs-47] <TRANSACTION>
  <FUNCTION_TYPE>SECONDARYPORT</FUNCTION_TYPE>
  <COMMAND>STATUS</COMMAND>
  <MAC_LABEL>XX</MAC_LABEL>
  <MAC>xOel7QeyKoXaddiyrEeWKRI1DlF9sHzUNfZHFI/gAko=</MAC>
 <COUNTER>XXX</COUNTER>
</TRANSACTION>

9/10/2015 11:30:17 AM [0-3-1-SecondaryPortStatus.cs-57] <RESPONSE>
  <RESPONSE_TEXT>Operation SUCCESSFUL</RESPONSE_TEXT>
  <RESULT>OK</RESULT>
  <RESULT_CODE>-1</RESULT_CODE>
  <TERMINATION_STATUS>SUCCESS</TERMINATION_STATUS>
  <COUNTER>221</COUNTER>
  <SECONDARY_DATA>12</SECONDARY_DATA>
  <MACLABEL_IN_SESSION>P_061</MACLABEL_IN_SESSION>
  <SESSION_DURATION>00:00:16</SESSION_DURATION>
  <INVOICE_SESSION>XX</INVOICE_SESSION>
  <SERIAL_NUMBER>XX</SERIAL_NUMBER>
</RESPONSE>`

我已经能够使用PERL正则表达式语法解析日期戳和错误选择器(括号内的所有内容),如下所示。

1. ^(\d\d|\d)/(\d\d|\d)/(\d\d\d\d)\s(\d\d|\d):(\d\d|\d):(\d\d|\d)\s(AM|PM) 
2. \[(.*)\] 
  1. 日期
  2. 日志类型标识符
  3. 消息将是我想要弄清楚的。
  4. 但我无法弄清楚如何在选择器和新线之间拉出所有东西。所以在这个例子中,我希望我的消息是新行之前的XML代码。有没有人有关于如何检索数据的建议?

2 个答案:

答案 0 :(得分:1)

您应该能够使用nxlog的 xm_multiline 模块,并在 HeaderLine 指令中指定正则表达式。 如果您将一个捕获规则添加到regexp以匹配XML部分( [...] 之后的东西),那么您应该能够使用xm_xml的parse_xml()解析XML。

有一个类似的例子here

答案 1 :(得分:0)

尝试使用多行ReGex:

$ perl -0777 -ne 'print $& if !<RESPONSE>.*</RESPONSE>!s' file

将输入/输出分隔符设置为undef( - 0777)会将整个文件粘贴到内存中

输出:

<RESPONSE>
  <RESPONSE_TEXT>Operation SUCCESSFUL</RESPONSE_TEXT>
  <RESULT>OK</RESULT>
  <RESULT_CODE>-1</RESULT_CODE>
  <TERMINATION_STATUS>SUCCESS</TERMINATION_STATUS>
  <COUNTER>221</COUNTER>
  <SECONDARY_DATA>12</SECONDARY_DATA>
  <MACLABEL_IN_SESSION>P_061</MACLABEL_IN_SESSION>
  <SESSION_DURATION>00:00:16</SESSION_DURATION>
  <INVOICE_SESSION>XX</INVOICE_SESSION>
  <SERIAL_NUMBER>XX</SERIAL_NUMBER>
</RESPONSE>

在脚本中:

BEGIN { $/ = undef; $\ = undef; } # input/output separator as undef
while (defined($_ = <ARGV>)) {
    print $& if m[<RESPONSE>.*</RESPONSE>]s;
}
来自perldoc perlre

修饰符&#39;

 s   Treat string as single line. That is, change "." to match any
     character whatsoever, even a newline, which normally it would not
     match.