如何从email.parser.Parser返回的Message对象中获取邮件正文(或正文)?

时间:2016-11-29 21:33:31

标签: python rfc2822

我正在阅读Python 3 docs here,我必须是盲人或其他什么......它在哪里说如何获取信息的正文?

我想要做的是打开一条消息并在消息的基于文本的主体中执行一些循环,跳过二进制附件。伪代码:

def read_all_bodies(local_email_file):
    email = Parser().parse(open(local_email_file, 'r'))
    for pseudo_body in email.pseudo_bodies:
        if pseudo_body.pseudo_is_binary():
            continue
        # Pseudo-parse the body here

我该怎么做?甚至Message类是否正确类?它不只是标题吗?

1 个答案:

答案 0 :(得分:1)

最好使用两个功能完成:

  1. 一个打开文件。如果消息是单个部分,则get_payload将在消息中返回字符串。如果message是multipart,则返回子消息列表
  2. 第二个处理文本/有效负载
  3. 这是如何做到的:

    def parse_file_bodies(filename):
        # Opens file and parses email
        email = Parser().parse(open(filename, 'r'))
        # For multipart emails, all bodies will be handled in a loop
        if email.is_multipart():
            for msg in email.get_payload():
                parse_single_body(msg)
        else:
            # Single part message is passed diractly
            parse_single_body(email)
    
    def parse_single_body(email):
        payload = email.get_payload(decode=True)
        # The payload is binary. It must be converted to
        # python string depending in input charset
        # Input charset may vary, based on message
        try:
            text = payload.decode("utf-8")
            # Now you can work with text as with any other string:
            ...
        except UnicodeDecodeError:
            print("Error: cannot parse message as UTF-8")
            return