Question

我有一些像

这样的文字

 proposed that the U n i o n  D i s u n i o n was there

所以在普通字符串的每个字符之间有一个空格。

预期输出仅与“U n i o n D i s u n i n”匹配。双倍间距。

我想要一个只与双空格部分匹配的正则表达式。 [a-zA-Z](?=\s)\s匹配单个部分（'U'），但我看不到如何扩展它。

Answer 1

(?<!\w)(?:\w\s+(?=\w\s))+\w

应该做的。

(?<!\w) # assert there aren't 2 word characters in a row
(?:
    \w\s+ # match a word character and whitespace...
    (?=\w\s) # if there's another word character and a space
)+ # any number of times.
\w # finally match the last word character (but no space)

Answer 2

在评论中加入@Sam的想法，以及(?:(?<=\s)|^)等同于(?!\S)这一事实，我们可以将正则表达式简化为：

(?<!\S)\w(?: +\w)+(?!\S)

我还在Rawing的回答中将(token separator)* token形式切换为首选形式token (separator token)*，这对回溯引擎的回溯略少。

中间的\s+切换到 +以仅允许空格（U + 0020）。根据支持，\h+可能更合适。我不认为你想跨越各行。

正则表达式匹配正常文本中的文本

2 个答案: