Question

我有如下的html字符串：

<whatevertag do-not-change-this="word" or-this-word="">
  these words should be replaced with a word inside braces,
  and also the same word thing for
  <whatevertag>
      the nested tags that has the word
  </whatevertag>
</whatevertag>

我试图像这样输出：

<whatevertag do-not-change-this="word" or-this-word="">
  these {word}s should be replaced with a {word} inside braces,
  and also the same {word} thing for
  <whatevertag>
      the nested tags that has the {word}
  </whatevertag>
</whatevertag>

我已经尝试过这个表达式(>[^>]*?)(word)([^<]*?<)，并且为了替换，我使用了$1{$2}$3 ..令人惊讶的是（至少对我来说）它只适用于第一场比赛，输出是：

<whatevertag do-not-change-this="word" or-this-word="">
    these {word}s should be replaced with a word inside braces,
    and also the same word thing for
    <whatevertag>
        the nested tags that has the {word}
    </whatevertag>
</whatevertag>

为什么会这样。以及如何解决它？

Answer 1

你的正则表达式失败的原因是：

(>[^>]*?)                  # read '>', then lazily any character except '>'
(word)                     # until you encounter 'word'
([^<]*?<)                  # then lazily read any character except '<' until you find a '<'

所以，只要你已经捕获了“＃”字样。你的正则表达式会一直读到第一个＆＃39;＆lt;＆＃39;这就是为什么第二个单词＆＃39;未被捕获。

你可以使用的是：

(?:(?!word).)+(word)

说明：

(?:                         # Do not capture
(?!word).)+                 # Negative lookahead for word. Read 1 char
(word)                      # until you find 'word'

查看example

编辑：重读你的问题，你明确表示你想捕捉＆＃34;之外的所有内容。标签。看一眼： example 2

正则表达式是：

((?!word)[^>])+(word)([^<]+) # read all characters, except 
                             # '>' until you encounter 'word'
                             # read 'word'
                             # capture all following characters, except '<'

使用大括号将XML标记内的每个匹配单词包围

1 个答案: