我正在尝试编译正则表达式,以便能够从推文中累积一系列主题标签(r'#\w+'
)。我希望能够编译两个正则表达式,可以从推文的开头和结尾处执行此操作。我使用的是python 272,我的代码是这样的。
HASHTAG_SEQ_REGEX_PATTERN = r"""
( #Outermost grouping to match overall regex
#\w+ #The hashtag matching. It's a valid combination of \w+
([:\s,]*#\w+)* #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) #Closing parenthesis of outermost grouping to match overall regex
"""
LEFT_HASHTAG_REGEX_SEQ = re.compile('^' + HASHTAG_SEQ_REGEX_PATTERN , re.VERBOSE | re.IGNORECASE)
当执行我正在编译正则表达式的行时,出现以下错误:
sre_constants.error: unbalanced parenthesis
我不知道为什么会得到这个,因为我的正则表达式中没有不平衡的括号。
答案 0 :(得分:5)
此行在FIRST #
后面注释掉:
v----comment starts here
([:\s,]*#\w+)* ...
逃脱:
([:\s,]*\#\w+)*
这一行,但它不会导致不平衡的括号:)
v----escape me
#\w+ #The hashtag matching ...
HASHTAG_SEQ_REGEX_PATTERN = r"""
( # Outermost grouping to match overall regex
\#\w+ # The hashtag matching. It's a valid combination of \w+
([:\s,]*\#\w+)* # This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) # Closing parenthesis of outermost grouping to match overall regex
"""
答案 1 :(得分:3)
你有一些你想要合法使用的非转义哈希,但是VERBOSE
让你搞砸了:
\#\w+
([:\s,]*\#\w+)* #reported issue caused by this hash
答案 2 :(得分:2)
如果您将模式编写为folows,则不会遇到此问题:
HASHTAG_SEQ_REGEX_PATTERN = (
'(' #Outermost grouping to match overall regex
'#\w+' #The hashtag matching. It's a valid combination of \w+
'([:\s,]*#\w+)*' #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
')' #Closing parenthesis of outermost grouping to match overall regex
)
就个人而言,我从不使用re.VERBOSE,我从不提醒有关空白和其他人的规则
答案 3 :(得分:0)
或者,使用[#]
向正则表达式添加#
符号,但不打算发表评论:
HASHTAG_SEQ_REGEX_PATTERN = r"""
( #Outermost grouping to match overall regex
[#]\w+ #The hashtag matching. It's a valid combination of \w+
([:\s,]*[#]\w+)* #This is an optional (0 or more) sequence of hashtags separated by [\s,:]*
) #Closing parenthesis of outermost grouping to match overall regex
"""
我发现这更具可读性。