日期标题中包含无效小时的电子邮件会引发异常

时间:2017-06-14 23:54:29

标签: python email datetime python-3.6

使用Python 3.5或3.6,在使用email包的日期标头中加载包含无效小时的电子邮件后,尝试访问date标头会引发ValueError异常:< / p>

>>> import email
>>> from email import policy
>>> m = email.message_from_binary_file(open('bad_date.txt', 'rb'), policy=policy.default)
>>> m['date']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/email/message.py", line 391, in __getitem__
    return self.get(name)
  File "/usr/lib/python3.6/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.6/email/headerregistry.py", line 586, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.6/email/headerregistry.py", line 197, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.6/email/headerregistry.py", line 303, in parse
    value = utils.parsedate_to_datetime(value)
  File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime
    tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
ValueError: hour must be in 0..23

这是电子邮件中的标题:

Date: Tue, 06 Jun 2017 27:39:33 +0600

(我正在分析垃圾邮件,而某人的垃圾邮件发送程序似乎并不了解时区转换的工作原理。我也看到了负面的时间......)

email包旨在解决通过将电子邮件注册为缺陷来解析电子邮件时遇到的问题,因此在这种情况下,抛出异常似乎是错误的结果。

我可以尝试更新header_factory政策的默认default来处理这种情况,但它似乎更像是Python中的一个错误parsedate_to_datetime表现得这样。 (显然这种行为是on purpose。)

更新:我已将此提升为Python bug

1 个答案:

答案 0 :(得分:0)

以下是我现在使用的解决方法:

from email import policy
from email import errors
from email import _header_value_parser as parser
from email.headerregistry import HeaderRegistry, DateHeader


class DateHeaderRobust(DateHeader):
    """
    Copied and updated from email/headerregistry.py to handle
    ValueError returned by parsedate_to_datetime when a date header
    has an invalid hour value (outside 0..23)
    """

    @classmethod
    def parse(cls, value, kwds):
        try:
            super().parse(value, kwds)
        except ValueError:
            kwds['defects'].append(
                errors.InvalidHeaderDefect('Invalid value in date'))
            kwds['datetime'] = None
            kwds['decoded'] = value
            kwds['parse_tree'] = parser.TokenList()


class UniqueDateHeader(DateHeaderRobust):
    max_count = 1


header_factory = HeaderRegistry()
header_factory.map_to_type('date', UniqueDateHeader)

email_policy = policy.default.clone(header_factory=header_factory)

然后在阅读邮件时(例如使用email.message_from_binary_file),使用policy=email_policy作为kwarg。