Question

我正在尝试将@字符前的电子邮件地址的本地部分与：

匹配

LOCAL_RE_NOTQUOTED = """
((
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+-/=?^_`{|}~]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?:@)           # no end with dot before @
"""

测试：

re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).group()

给出：

'a.a..a@'

为什么输出中会打印@，即使我使用的是非捕获组(?:@)？

测试：

 re.match(LOCAL_RE_NOTQUOTED, "a.a..a@", re.VERBOSE).groups()

给出：

('a.a..a', 'a', 'a', None)

为什么正则表达式不会拒绝带有一对点'..'的字符串？

Answer 1

您将非捕获组(?:...)和前瞻断言(?=...)混淆。

前者确实参与了比赛（因此是match.group()的一部分，其中包含整体匹配），他们只是不生成反向引用（$1等供以后使用）。< / p>

第二个问题（为什么双点匹配？）有点棘手。这是因为正则表达式中存在错误。你看，你写的时候（缩短为重点）

[+-/]

你写了“在+和/之间匹配一个字符，在ASCII中，点在它们之间（ASCII 43-47：+,-./）。因此，第一个字符class匹配点，从未到达前瞻断言。你需要将短划线放在字符类的末尾，将其视为文字短划线：

((
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special chars, but no dot at beginning
)
(
\w         # alphanumeric and _
| [!#$%&'*+/=?^_`{|}~-]          # special characters
| ([.](?![.])) # negative lookahead to avoid pairs of dots. 
)*)
(?<!\.)(?=@)           # no end with dot before @

当然，如果你想使用这个逻辑，你可以简化一下：

^(?!\.)                   # no dot at the beginning
(?:
[\w!#$%&'*+/=?^_`{|}~-]   # alnums or special characters except dot
| (\.(?![.@]))            # or dot unless it's before a dot or @ 
)*
(?=@)                     # end before @

前瞻性和非捕获正则表达式

1 个答案: