Question

我有这种结构的XML文件

<item><rank>15</rank>...<price>100</price></item>
<item><rank>15</rank>...<price>200</price></item>
<item><rank>15</rank>...<price>500</price></item>

来自上面的 xml ...表示：某些不同的标签以某种方式描述项目（可能是任何标签）

因此，我需要找到包含price=500的项目并替换rank。

<item><rank>\d+<\/rank>(.*)<price>500<\/price><\/item>

但是 RegExp 会在最后从第一个<item>和<price>500</price></item>开始查找这三个标记的内容。

所以我需要在搜索中从</item>中排除(.*)。

Answer 1

请参阅this regex

/(?:<item>(?:<rank>(\d+)<\/rank>)(?:(?!<\/item>).)*(?:<price>500<\/price>)<\/item>)/igm

通过使用括号，您可以创建捕获组; ?:是一个非捕获组（意味着您对其内容不感兴趣） igm表示不区分大小写，全局和多行 (?!sth)是一个负面的预测，这意味着我们将丢弃sth。

一步一步:(来自外部标签）

(?:<item> ... <\/item>) # we're interested in things beginning with <item> and ending with </item> and we're not capturing the group

... (?:<rank>(\d+)<\/rank>) ... # there's a rank tag, we're not capturing it, but we're capturing the digits within the tag

... (?:(?!<\/item>).)* ... # the crux of the problem, we're looking at any character except <\/item>

... (?:<price>500<\/price>)<\/item>) # the "line" ends with these tags

希望它有所帮助。

如何从正则表达式搜索中排除某些单词？

1 个答案: