我正在尝试构建一个正则表达式来满足这些条件:
[不要匹配]
dont:match@example.com
[MATCH]
mailto:match@example.com
match@example.com
<p>match@example.com</p>
我可以匹配最后两个,但第一个例子(DO NOT MATCH)也匹配。
如何确保电子邮件仅在mailto:
明确或继续下去时有效,而不仅仅是:
?
http://rubular.com/r/HvldBe4Ew9
正则表达式:
(?<=mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)
答案 0 :(得分:1)
如果字符串作为单独的值传递,您可以使用锚点^
和$
来匹配字符串开头/结尾:
(?<=>)(?:mailto:)?([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+)(?=<)
或者,摆脱捕获群体:
(?<=>)(?:mailto:)?[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+(?=<)
请参阅demo
请注意,[a-zA-Z0-9-.]
中存在问题:连字符符号不应出现在字符类的中间位置。
答案 1 :(得分:0)
无需a-zA-Z
,只需使用A-Z
并使re.IGNORECASE
使正则表达式不区分大小写。
另外请确保使用
^
断言行开头的位置
和
$
在一行的末尾断言位置
Python示例:
import re
match = re.search(r"^(?:mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[\tA-Z0-9-.]+)$", email, re.IGNORECASE)
if match:
result = match.group(1)
else:
result = ""
演示:
https://regex101.com/r/cI1eD6/1
正则表达式解释:
^(mailto:)?([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)$
Options: Case insensitive
Assert position at the beginning of a line «^»
Match the regex below and capture its match into backreference number 1 «(mailto:)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character string “mailto:” literally «mailto:»
Match the regex below and capture its match into backreference number 2 «([A-Z0-9_.+-]+@[A-Z0-9-]+\.[A-Z0-9-.]+)»
Match a single character present in the list below «[A-Z0-9_.+-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “_.+” «_.+»
The literal character “-” «-»
Match the character “@” literally «@»
Match a single character present in the list below «[A-Z0-9-]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
The literal character “-” «-»
Match the character “.” literally «\.»
Match a single character present in the list below «[A-Z0-9-.]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “-.” «-.»
Assert position at the end of a line «$»