Question

我正在努力解决有关正则表达式的一个小问题。

我想将特定字符的所有奇数长度子串替换为具有相同长度但具有不同字符的另一个子串。所有指定字符的偶数序列应保持不变。

简化示例：字符串包含字母a，b和y，y的所有奇数长度序列应替换为z：

abyyyab -> abzzzab

另一个可能的例子可能是：

ycyayybybcyyyyycyybyyyyyyy

变为

zczayybzbczzzzzcyybzzzzzzz

使用正则表达式匹配奇数长度的所有序列没有问题。

不幸的是我不知道如何将这些匹配的长度信息合并到替换字符串中。我知道我必须以某种方式使用反向引用/捕获组，但即使在阅读了大量文档和Stack Overflow文章后，我仍然不知道如何正确地解决问题。

关于可能的正则表达式引擎，我主要与Emacs或Vim合作。

如果我忽略了一个没有复杂正则表达式的简单通用解决方案（例如一系列简单的简单搜索和替换命令），这也会有所帮助。

Answer 1

以下是我在vim中的表现：

:s/\vy@<!y(yy)*y@!/\=repeat('z', len(submatch(0)))/g

说明：

我们使用的正则表达式是\vy@<!y(yy)*y@!。开头的\v会打开magic选项，因此我们无需逃避。没有它，我们会y\@<!y\(yy\)*y\@!。

此次搜索的基本思路是，我们正在寻找一个'y'y，然后是一对'y'，(yy)*。然后我们添加y@<!以确保在匹配之前没有'y'，并添加y\@!以确保之后没有'y' 我们的比赛。

然后我们使用eval寄存器替换它，即\=。来自:h sub-replace-\=：

*sub-replace-\=* *s/\=* When the substitute string starts with "\=" the remainder is interpreted as an expression. The special meaning for characters as mentioned at |sub-replace-special| does not apply except for "<CR>". A <NL> character is used as a line break, you can get one with a double-quote string: "\n". Prepend a backslash to get a real <NL> character (which will be a NUL in the file). The "\=" notation can also be used inside the third argument {sub} of |substitute()| function. In this case, the special meaning for characters as mentioned at |sub-replace-special| does not apply at all. Especially, <CR> and <NL> are interpreted not as a line break but as a carriage-return and a new-line respectively. When the result is a |List| then the items are joined with separating line breaks. Thus each item becomes a line, except that they can contain line breaks themselves. The whole matched text can be accessed with "submatch(0)". The text matched with the first pair of () with "submatch(1)". Likewise for further sub-matches in ().

TL; DR ，:s/foo/\=blah将 foo 替换为blah，评估为vimscript代码。所以我们正在评估的代码是repeat('z', len(submatch(0)))，它只是为我们匹配的每个'y'做'z'。

替换字符的奇数长度子串

1 个答案: