如何在Python中使用括号将电子邮件FROM标题解析?

时间:2018-12-18 22:12:29

标签: python email parsing

我在使用Python email模块解析FROM头中带有括号的电子邮件时遇到了麻烦。仅在使用email.policy.default而不是email.policy.compat32时才出现问题。

除了切换策略之外,是否有解决此问题的方法?

以下是Python 3.6.5的最小工作示例:

import email
import email.policy as email_policy

raw_mime_msg=b"from: James Mishra \\(says hi\\) <james@example.com>"

compat32_obj = email.message_from_bytes(
    raw_mime_msg, policy=email_policy.compat32)

default_obj = email.message_from_bytes(
    raw_mime_msg, policy=email_policy.default)

print(compat32_obj['from'])
print(default_obj['from'])

第一个打印语句返回: James Mishra \(says hi\) <james@example.com> 并且第二条打印语句返回:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1908, in get_address
    token, value = get_group(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1867, in get_group
    "display name but found '{}'".format(value))
email.errors.HeaderParseError: expected ':' at end of group display name but found '\(says hi\) <james@example.com>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1734, in get_mailbox
    token, value = get_name_addr(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1720, in get_name_addr
    token, value = get_angle_addr(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1646, in get_angle_addr
    "expected angle-addr but found '{}'".format(value))
email.errors.HeaderParseError: expected angle-addr but found '\(says hi\) <james@example.com>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_email.py", line 12, in <module>
    print(default_obj['from'])
  File "/usr/local/lib/python3.6/email/message.py", line 391, in __getitem__
    return self.get(name)
  File "/usr/local/lib/python3.6/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/local/lib/python3.6/email/policy.py", line 162, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/local/lib/python3.6/email/headerregistry.py", line 589, in __call__
    return self[name](name, value)
  File "/usr/local/lib/python3.6/email/headerregistry.py", line 197, in __new__
    cls.parse(value, kwds)
  File "/usr/local/lib/python3.6/email/headerregistry.py", line 340, in parse
    kwds['parse_tree'] = address_list = cls.value_parser(value)
  File "/usr/local/lib/python3.6/email/headerregistry.py", line 331, in value_parser
    address_list, value = parser.get_address_list(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1931, in get_address_list
    token, value = get_address(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1911, in get_address
    token, value = get_mailbox(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1737, in get_mailbox
    token, value = get_addr_spec(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1583, in get_addr_spec
    token, value = get_local_part(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1413, in get_local_part
    obs_local_part, value = get_obs_local_part(str(local_part) + value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1454, in get_obs_local_part
    token, value = get_word(value)
  File "/usr/local/lib/python3.6/email/_header_value_parser.py", line 1340, in get_word
    if value[0]=='"':
IndexError: string index out of range

1 个答案:

答案 0 :(得分:1)

email.policy.default旨在与电子邮件RFC兼容,并且您的消息与RFC 5322不兼容。如果括号中的部分应该是注释,则该消息应类似于

raw_mime_msg=b"from: James Mishra (says hi) <james@example.com>"

符合要求。如果不应将其作为注释,则括号应出现在带引号的字符串内。可能看起来像

raw_mime_msg=b'from: "James Mishra (says hi)" <james@example.com>'

由于您的消息不符合要求,因此使用期望符合要求的策略是不合适的。如果您要处理不符合要求的邮件,那么email.policy.compat32email.policy.default更好。