Notepad ++ reg表达式从日志文件中提取xml消息

时间:2017-04-20 04:37:57

标签: notepad++

我有一个日志文件,其内容如下所示。我正在尝试提取与少数项目编号匹配的xml分段,例如6654721,6654722和6654725.预期输出是与这三个项目编号匹配的完整xml分段。我尝试使用正则表达式(<Record>.*? </Record>)来确切地找到每个xml分段然后我尝试应用像(<Record>.*?(6654721|6654722|6654725).*?</Record>)这样的过滤器,但这不能按预期工作。你能帮助我解决这个问题吗?感谢您对advanace的回应。

 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654721</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654722</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654723</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>
 2017-04-20 some log file
 2017-04-20 some log file
 2017-04-20 some log file
 <Record>
     <itemname>Lego Fire Rescue</itemname>
     <itemnumber>6654725</itemnumber>
     <availableinv>19</availableinv>
     <ageplus>3</ageplus>
     <storeId>19</storeId> 
 </Record>

1 个答案:

答案 0 :(得分:1)

这个正则表达式完成了这项工作:

<Record[^>]*>(?:(?!</Record>).)*\b(?:6654721|6654722|6654725)\b.*?</Record>

<强>解释

<Record[^>]>        : '<Record>' with optional attributes
(?:                 : start non capture group
    (?!             : start negative lookahead, make sure we have not the following
        </Record>   : literally '</Record>'
    )               : end lookahead
    .               : any character
)*                  : repeat the non capture group, at this place we are sure we have not </Record>
\b                  : word boundary
(?:                 : non capture group
    6654721         : 6654721
    |               : OR
    6654722         : 6654722
    |               : OR
    6654725         : 6654725
)                   : end group
\b                  : word boundary
.*?                 : 0 or more any character, non greedy
</Record>           : literally '</Record>'