使用Python 3.5或3.6,在使用email
包的日期标头中加载包含无效小时的电子邮件后,尝试访问date
标头会引发ValueError
异常:< / p>
>>> import email
>>> from email import policy
>>> m = email.message_from_binary_file(open('bad_date.txt', 'rb'), policy=policy.default)
>>> m['date']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/email/message.py", line 391, in __getitem__
return self.get(name)
File "/usr/lib/python3.6/email/message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
File "/usr/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
return self.header_factory(name, value)
File "/usr/lib/python3.6/email/headerregistry.py", line 586, in __call__
return self[name](name, value)
File "/usr/lib/python3.6/email/headerregistry.py", line 197, in __new__
cls.parse(value, kwds)
File "/usr/lib/python3.6/email/headerregistry.py", line 303, in parse
value = utils.parsedate_to_datetime(value)
File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime
tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
ValueError: hour must be in 0..23
这是电子邮件中的标题:
Date: Tue, 06 Jun 2017 27:39:33 +0600
(我正在分析垃圾邮件,而某人的垃圾邮件发送程序似乎并不了解时区转换的工作原理。我也看到了负面的时间......)
email
包旨在解决通过将电子邮件注册为缺陷来解析电子邮件时遇到的问题,因此在这种情况下,抛出异常似乎是错误的结果。
我可以尝试更新header_factory
政策的默认default
来处理这种情况,但它似乎更像是Python中的一个错误parsedate_to_datetime
表现得这样。 (显然这种行为是on purpose。)
更新:我已将此提升为Python bug
答案 0 :(得分:0)
以下是我现在使用的解决方法:
from email import policy
from email import errors
from email import _header_value_parser as parser
from email.headerregistry import HeaderRegistry, DateHeader
class DateHeaderRobust(DateHeader):
"""
Copied and updated from email/headerregistry.py to handle
ValueError returned by parsedate_to_datetime when a date header
has an invalid hour value (outside 0..23)
"""
@classmethod
def parse(cls, value, kwds):
try:
super().parse(value, kwds)
except ValueError:
kwds['defects'].append(
errors.InvalidHeaderDefect('Invalid value in date'))
kwds['datetime'] = None
kwds['decoded'] = value
kwds['parse_tree'] = parser.TokenList()
class UniqueDateHeader(DateHeaderRobust):
max_count = 1
header_factory = HeaderRegistry()
header_factory.map_to_type('date', UniqueDateHeader)
email_policy = policy.default.clone(header_factory=header_factory)
然后在阅读邮件时(例如使用email.message_from_binary_file)
,使用policy=email_policy
作为kwarg。