我有一个日志文件,其内容如下所示。我正在尝试提取与少数项目编号匹配的xml分段,例如6654721,6654722和6654725.预期输出是与这三个项目编号匹配的完整xml分段。我尝试使用正则表达式(<Record>.*? </Record>)
来确切地找到每个xml分段然后我尝试应用像(<Record>.*?(6654721|6654722|6654725).*?</Record>)
这样的过滤器,但这不能按预期工作。你能帮助我解决这个问题吗?感谢您对advanace的回应。
2017-04-20 some log file
2017-04-20 some log file
2017-04-20 some log file
<Record>
<itemname>Lego Fire Rescue</itemname>
<itemnumber>6654721</itemnumber>
<availableinv>19</availableinv>
<ageplus>3</ageplus>
<storeId>19</storeId>
</Record>
2017-04-20 some log file
2017-04-20 some log file
2017-04-20 some log file
<Record>
<itemname>Lego Fire Rescue</itemname>
<itemnumber>6654722</itemnumber>
<availableinv>19</availableinv>
<ageplus>3</ageplus>
<storeId>19</storeId>
</Record>
2017-04-20 some log file
2017-04-20 some log file
2017-04-20 some log file
<Record>
<itemname>Lego Fire Rescue</itemname>
<itemnumber>6654723</itemnumber>
<availableinv>19</availableinv>
<ageplus>3</ageplus>
<storeId>19</storeId>
</Record>
2017-04-20 some log file
2017-04-20 some log file
2017-04-20 some log file
<Record>
<itemname>Lego Fire Rescue</itemname>
<itemnumber>6654725</itemnumber>
<availableinv>19</availableinv>
<ageplus>3</ageplus>
<storeId>19</storeId>
</Record>
答案 0 :(得分:1)
这个正则表达式完成了这项工作:
<Record[^>]*>(?:(?!</Record>).)*\b(?:6654721|6654722|6654725)\b.*?</Record>
<强>解释强>
<Record[^>]> : '<Record>' with optional attributes
(?: : start non capture group
(?! : start negative lookahead, make sure we have not the following
</Record> : literally '</Record>'
) : end lookahead
. : any character
)* : repeat the non capture group, at this place we are sure we have not </Record>
\b : word boundary
(?: : non capture group
6654721 : 6654721
| : OR
6654722 : 6654722
| : OR
6654725 : 6654725
) : end group
\b : word boundary
.*? : 0 or more any character, non greedy
</Record> : literally '</Record>'