Question

我正在尝试从文本中提取电子邮件。我使用了re.search，它返回1.发生，然后我继续使用re.findall。令我惊讶的是re.findall找到的电子邮件少于re.search。可能是什么问题呢？

代码：

searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        if searchObj:
            mail = searchObj.group()
            if mail not in emails:
                emails.add(mail)

listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        for mail in listEmails:
            if mail not in emails:
                emails.add(mail)

Answer 1

将捕获组(\.|-)替换为非捕获组，或者甚至替换为字符类：

r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
                               ^^^^

甚至更短：

r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'

否则，re.findall将仅返回捕获值列表。

Python demo：

import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = 'some@mail.com and more email@somemore-here.com'
print(re.findall(rx, s))
# => ['some@mail.com', 'email@somemore-here.com']

使用python和regex查找文本中的电子邮件

1 个答案: