我尝试了很多正则表达式代码来从具有此格式的电子邮件中提取日期,但我无法:
Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)
Sent: Thursday, November 08, 2001 10:25 AM
这在所有电子邮件中看起来如何,我想将它们都提取出来。
提前谢谢
答案 0 :(得分:1)
你可以使用这种模式做这样的事情:
使用Python3:
import re
data = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
final = re.findall(r"Date: (\w+), ([0-9]+) (\w+) ([0-9]+)", data)
print("{0}, {1}".format(final[0][0], " ".join(final[0][1:])))
print(" ".join(final[0][1:]))
使用Python2:
import re
data = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
final = re.findall(r"Date: (\w+), ([0-9]+) (\w+) ([0-9]+)", data)
print "%s, %s" % (final[0][0], " ".join(final[0][1:]))
print " ".join(final[0][1:])
<强>输出:强>
Tue, 13 Nov 2001
13 Nov 2001
修改强>
快速回答问题的新更新,您可以执行以下操作:
import re
email = '''Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)
Sent: Thursday, November 08, 2001 10:25 AM'''
data = email.split("\n")
pattern = r"(\w+: \w+, [0-9]+ \w+ [0-9]+)|(\w+: \w+, \w+ [0-9]+, [0-9]+)"
final = []
for k in data:
final += re.findall(pattern, k)
final = [j.split(":") for k in final for j in k if j != '']
# Python3
print(final)
# Python2
# print final
输出:
[['Date', ' Tue, 13 Nov 2001'], ['Sent', ' Thursday, November 08, 2001']]
答案 1 :(得分:0)
import re
my_email = 'Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)'
match = re.search(r': (\w{3,3}, \d{2,2} \w{3,3} \d{4,4})', my_email)
print(match.group(1))
答案 2 :(得分:0)
我不是正则表达式专家,但这是一个解决方案,你可以为它编写一些测试
d = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
dates = [re.search('(\d+ \w+ \d+)',date).groups()[0] for date in re.search('(Date: \w+, \d+ \w+ \d+)', d).groups()]
['2001年11月13日']
答案 3 :(得分:0)
如果只提取相同的字符串模型,则可以使用split()
而不是使用正则表达式:
email_date = "Date: Tue, 13 Nov 2001 08:41:49 -0800 (PST)"
email_date = '%s %s %s %s' % (tuple(email_date.split(' ')[1:5]))
输出:
Tue, 13 Nov 2001