我有如下的html字符串:
<whatevertag do-not-change-this="word" or-this-word="">
these words should be replaced with a word inside braces,
and also the same word thing for
<whatevertag>
the nested tags that has the word
</whatevertag>
</whatevertag>
我试图像这样输出:
<whatevertag do-not-change-this="word" or-this-word="">
these {word}s should be replaced with a {word} inside braces,
and also the same {word} thing for
<whatevertag>
the nested tags that has the {word}
</whatevertag>
</whatevertag>
我已经尝试过这个表达式(>[^>]*?)(word)([^<]*?<)
,并且为了替换,我使用了$1{$2}$3
..令人惊讶的是(至少对我来说)它只适用于第一场比赛,输出是:
<whatevertag do-not-change-this="word" or-this-word="">
these {word}s should be replaced with a word inside braces,
and also the same word thing for
<whatevertag>
the nested tags that has the {word}
</whatevertag>
</whatevertag>
为什么会这样。以及如何解决它?
答案 0 :(得分:2)
你的正则表达式失败的原因是:
(>[^>]*?) # read '>', then lazily any character except '>'
(word) # until you encounter 'word'
([^<]*?<) # then lazily read any character except '<' until you find a '<'
所以,只要你已经捕获了“#”字样。你的正则表达式会一直读到第一个&#39;&lt;&#39;这就是为什么第二个单词&#39;未被捕获。
你可以使用的是:
(?:(?!word).)+(word)
说明:
(?: # Do not capture
(?!word).)+ # Negative lookahead for word. Read 1 char
(word) # until you find 'word'
查看example
编辑:重读你的问题,你明确表示你想捕捉&#34;之外的所有内容。标签。看一眼: example 2
正则表达式是:
((?!word)[^>])+(word)([^<]+) # read all characters, except
# '>' until you encounter 'word'
# read 'word'
# capture all following characters, except '<'