我有一个列表,其中包含姓名,电子邮件地址,位置,日期和时间等。
从列表中,我只想提取姓名和电子邮件地址。
原始文本表示就像,
Email address: abc103@gmail.com
City/town: Hills, United States
Last access: Saturday, 6 January 2018, 8:46 PM (17 secs)
所以,在python列表中,它显示如下。
import re
lst = [['name1', 'Email address: abc103@gmail.com\nCity/town: Hills , United States\nLast access: Saturday, 6 January 2018, 8:46 PM (17 secs)'], ['name2', 'Email address: cde123@example.com\nCity/town: San Francisco, United States\nLast access: Saturday, 6 January 2018, 8:46 PM (48 secs)'], ['name3', 'Email address: nnn9@something.com\nCity/town: Fremont, United States\nLast access: Saturday, 6 January 2018, 8:43 PM (3 mins 21 secs)'], ['name4', 'City/town: Tenafly, United States\nLast access: Saturday, 6 January 2018, 8:36 PM (10 mins 14 secs)'],... list goes on.
for i in range(0, len(lst)):
extract = re.findall(r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)', lst[i][1],re.MULTILINE)
lst[i][1] = extract
print(lst)
然而,输出就像,
[['name1', []], ['name2', []], ['name3', []], ....
我的正则表达式出了什么问题? 如何将re.findall应用于包含换行符的多行?
答案 0 :(得分:0)
这对我有用:
import re
lst = [['name1', 'Email address: abc103@gmail.com\nCity/town: Hills , United States\nLast access: Saturday, 6 January 2018, 8:46 PM (17 secs)'], ['name2', 'Email address: cde123@example.com\nCity/town: San Francisco, United States\nLast access: Saturday, 6 January 2018, 8:46 PM (48 secs)'], ['name3', 'Email address: nnn9@something.com\nCity/town: Fremont, United States\nLast access: Saturday, 6 January 2018, 8:43 PM (3 mins 21 secs)'], ['name4', 'City/town: Tenafly, United States\nLast access: Saturday, 6 January 2018, 8:36 PM (10 mins 14 secs)']]
#lst[0][1].findall('([a-zA-Z][a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.][a-zA-Z]+)', expand=True)
for i in range(0, len(lst)):
extract = re.findall(r'([a-zA-Z][a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.][a-zA-Z]+)', lst[i][1],re.MULTILINE)
lst[i][1] = extract
print(lst)
输出:
[['name1', ['abc103@gmail.com']], ['name2', ['cde123@example.com']], ['name3', ['nnn9@something.com']], ['name4', []]]