Question

我正在尝试编译正则表达式，以便能够从推文中累积一系列主题标签（r'#\w+'）。我希望能够编译两个正则表达式，可以从推文的开头和结尾处执行此操作。我使用的是python 272，我的代码是这样的。

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                                       #Outermost grouping to match overall regex
#\w+                                    #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)*                          #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                                       #Closing parenthesis of outermost grouping to match overall regex
"""

LEFT_HASHTAG_REGEX_SEQ      = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)

当执行我正在编译正则表达式的行时，出现以下错误：

sre_constants.error: unbalanced parenthesis

我不知道为什么会得到这个，因为我的正则表达式中没有不平衡的括号。

Answer 1

此行在FIRST #后面注释掉：

        v----comment starts here
([:\s,]*#\w+)*  ...

逃脱：

([:\s,]*\#\w+)*

这一行，但它不会导致不平衡的括号：）

v----escape me
#\w+                                    #The hashtag matching ...

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                 # Outermost grouping to match overall regex
\#\w+             # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)*   # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                 # Closing parenthesis of outermost grouping to match overall regex
"""

Answer 2

你有一些你想要合法使用的非转义哈希，但是VERBOSE让你搞砸了：

\#\w+
([:\s,]*\#\w+)*   #reported issue caused by this hash

Answer 3

如果您将模式编写为folows，则不会遇到此问题：

HASHTAG_SEQ_REGEX_PATTERN = (
'('    #Outermost grouping to match overall regex
'#\w+'     #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*'    #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')'    #Closing parenthesis of outermost grouping to match overall regex
)

就个人而言，我从不使用re.VERBOSE，我从不提醒有关空白和其他人的规则

Answer 4

或者，使用[#]向正则表达式添加#符号，但不打算发表评论：

HASHTAG_SEQ_REGEX_PATTERN           = r"""
(                   #Outermost grouping to match overall regex
[#]\w+                #The hashtag matching. It's a valid combination of \w+
([:\s,]*[#]\w+)*      #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
)                   #Closing parenthesis of outermost grouping to match overall regex
"""

我发现这更具可读性。

Python re.compile。不平衡的括号错误

4 个答案: