普通电子邮件和mailto链接的正则表达式,但不是http基本身份验证

时间:2015-05-03 21:54:45

标签: python regex python-2.7

我正在尝试构建一个正则表达式来满足这些条件:

[不要匹配]

dont:match@example.com

[MATCH]

mailto:match@example.com
match@example.com
<p>match@example.com</p>

我可以匹配最后两个,但第一个例子(DO NOT MATCH)也匹配。

如何确保电子邮件仅在mailto:明确或继续下去时有效,而不仅仅是:

http://rubular.com/r/HvldBe4Ew9

正则表达式:

(?<=mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)

2 个答案:

答案 0 :(得分:1)

如果字符串作为单独的值传递,您可以使用锚点^$来匹配字符串开头/结尾:

(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)

或者,摆脱捕获群体:

(?<=>)(?:mailto:)?[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?=<)

请参阅demo

请注意,[a-zA-Z0-9-.]中存在问题:连字符符号不应出现在字符类的中间位置。

答案 1 :(得分:0)

无需a-zA-Z,只需使用A-Z并使re.IGNORECASE使正则表达式不区分大小写。
另外请确保使用

^断言行开头的位置

$在一行的末尾断言位置

Python示例:

import re

match = re.search(r"^(?:mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[\tA-Z0-9-.]+)$", email, re.IGNORECASE)
if match:
    result = match.group(1)
else:
    result = ""

演示:

https://regex101.com/r/cI1eD6/1

正则表达式解释:

^(mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)$

Options: Case insensitive

Assert position at the beginning of a line «^»
Match the regex below and capture its match into backreference number 1 «(mailto:)?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match the character string “mailto:” literally «mailto:»
Match the regex below and capture its match into backreference number 2 «([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)»
   Match a single character present in the list below «[A-Z0-9_.+-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “_.+” «_.+»
      The literal character “-” «-»
   Match the character “@” literally «@»
   Match a single character present in the list below «[A-Z0-9-]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      The literal character “-” «-»
   Match the character “.” literally «\.»
   Match a single character present in the list below «[A-Z0-9-.]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      A character in the range between “A” and “Z” «A-Z»
      A character in the range between “0” and “9” «0-9»
      A single character from the list “-.” «-.»
Assert position at the end of a line «$»