使用换行符从多行提取电子邮件地址

时间:2018-01-09 06:29:54

标签: regex python-3.x

我有一个列表,其中包含姓名,电子邮件地址,位置,日期和时间等。

从列表中,我只想提取姓名和电子邮件地址。

原始文本表示就像,

Email address: abc103@gmail.com
City/town: Hills, United States
Last access: Saturday, 6 January 2018, 8:46 PM  (17 secs)

所以,在python列表中,它显示如下。

import re

lst = [['name1', 'Email address: abc103@gmail.com\nCity/town: Hills , United States\nLast access: Saturday, 6 January 2018, 8:46 PM  (17 secs)'], ['name2', 'Email address: cde123@example.com\nCity/town: San Francisco, United States\nLast access: Saturday, 6 January 2018, 8:46 PM  (48 secs)'], ['name3', 'Email address: nnn9@something.com\nCity/town: Fremont, United States\nLast access: Saturday, 6 January 2018, 8:43 PM  (3 mins 21 secs)'], ['name4', 'City/town: Tenafly, United States\nLast access: Saturday, 6 January 2018, 8:36 PM  (10 mins 14 secs)'],... list goes on.

for i in range(0, len(lst)):
    extract = re.findall(r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)', lst[i][1],re.MULTILINE)
    lst[i][1] = extract

print(lst)

然而,输出就像,

[['name1', []], ['name2', []], ['name3', []], ....

我的正则表达式出了什么问题? 如何将re.findall应用于包含换行符的多行?

1 个答案:

答案 0 :(得分:0)

这对我有用:

import re
lst = [['name1', 'Email address: abc103@gmail.com\nCity/town: Hills , United States\nLast access: Saturday, 6 January 2018, 8:46 PM  (17 secs)'], ['name2', 'Email address: cde123@example.com\nCity/town: San Francisco, United States\nLast access: Saturday, 6 January 2018, 8:46 PM  (48 secs)'], ['name3', 'Email address: nnn9@something.com\nCity/town: Fremont, United States\nLast access: Saturday, 6 January 2018, 8:43 PM  (3 mins 21 secs)'], ['name4', 'City/town: Tenafly, United States\nLast access: Saturday, 6 January 2018, 8:36 PM  (10 mins 14 secs)']]
#lst[0][1].findall('([a-zA-Z][a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.][a-zA-Z]+)', expand=True)
for i in range(0, len(lst)):
    extract = re.findall(r'([a-zA-Z][a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.][a-zA-Z]+)', lst[i][1],re.MULTILINE)
    lst[i][1] = extract

print(lst)

输出:

[['name1', ['abc103@gmail.com']], ['name2', ['cde123@example.com']], ['name3', ['nnn9@something.com']], ['name4', []]]