Question

我写了这个正则表达式：

import re

sentence = "The quick brown fox jumps over the lazy dog."

myRegex = re.compile(
    r"(\w|\s)*"        #Ideally, zero or more (space characters or word characters) 
    r"(quick brown)"
)
matches = myRegex.findall(sentence)

print(matches)

我希望将[('The ', 'quick brown')]打印到屏幕上，这是我理想的状态，但是我却得到[(' ', 'quick brown')]。

类似地，我还尝试将正则表达式更改为：

myRegex = re.compile(
    r"((\w|\s)*)"  
    r"(quick brown)"
)

这将导致打印：[('The ', ' ', 'quick brown')]，它比以前更接近我想要的字体，但有第二组，由于只是空格字符而显得效率低下。

Answer 1

（\ w | / s）表示第一组仅包含一个字符。因此，好像整个重新匹配“快速棕色”一样，第一组是一个空格，因为在第一括号中只有一个字符。

Answer 2

正确的表达方式实际上取决于您要执行的操作...

您要quick brown前面的第一个单词吗？试试这个：

sentence = "This is the quick brown fox who jumps over the lazy dog."

myRegex = re.compile(
    r"(\w+)\s*"
    r"(quick brown)"
)

print(myRegex.findall(sentence))

# Result: [('the', 'quick brown')]

您是否还希望在单词后留空格？试试这个：

myRegex = re.compile(
    r"(\w+\s*)"
    r"(quick brown)"
)    

# Output: [('the ', 'quick brown')]

您是否要在quick brown之前输入整个词组？试试这个：

myRegex = re.compile(
    r"([\w\s]+)"
    r"(quick brown)"
)

# Result: [('This is the ', 'quick brown')]

无论哪种方式，这里*都不需要使用\w令牌（零个或多个），并且在没有单词匹配的情况下会导致问题。

如何在带有管道（OR）的正则表达式中使用“ *”，即“匹配零个或多个”？

2 个答案: