Question

我希望a|b|c的正则表达式重复多次，用空格分隔，但正则表达式不应该接受尾部空间：

 "a b c c b"  - ok
 "a b c c b " - not ok

所以我有"(a|b|c)( (a|b|c))+"而不是"((a|b|c) )+" 但是我的正则表达式超过3个单词，因此模式很长且不可读。

"^((?:word1|word2|word3|word4|...)(?: (?:word1|word2|word3|word4|...))+)$"

我只想问一下短版本，使用前期/外观机制查找最后一个空格或类似于仅匹配内部空间。如何改变((a|b|c) )+来实现这一目标？

Answer 1

如果这些都是字母数字，那么您可以使用word boundary anchors：

^(?:(?:word1|word2|word3|word4|...)\b\s*)+\b$

<强>解释

^      # Start of string
(?:    # Start of non-capturing group, first matching...
 (?:word1|word2|word3|word4|...) # ...one of these words,
 \b    # then matching the end of a word,
 \s*   # then matching zero or more whitespace
)+     # one or more times.
\b     # At the end, make sure that the end of the last word...
$      # ...is at the end of the string.

第一个\b确保在单词之间\s*必须至少匹配一个空白字符。

Answer 2

>>> compiled = re.compile(r'(?:(?:\s|^)(?:a|b|c))+$')
>>> compiled.match('a b c c b').group(0)
'a b c c b'
>>> compiled.match('a b c c b ')
None
>>> compiled.match('a').group(0)
'a'
>>> compiled.match('a ')
None

在正则表达式中编写可重复单词的简短版本的最佳方法蟒蛇

2 个答案: