在python中解析Email BodyStructure

时间:2012-10-05 04:38:34

标签: python parsing email

  

可能重复:
  parsing parenthesized list in python's imaplib

Python的电子邮件包可让您获取电子邮件的全文并将其解析为部分并浏览各个部分。但是,是否有一个库可以将IMAP协议返回的BODYSTRUCTURE响应解析为email.message.Message个部分?

修改

PHP中的等效方法是imap_fetchbody(),它自动处理结构的解析。

EDIT2

此问题作为重复内容被错误地关闭。它是将BODYSTRUCTURE嵌套表达式解析为有用的格式(如字典),而不是将原始字符串解析为python类型。在任何情况下,我最终都推出了自己的解决方案,这是将来遇到类似问题的人的代码。它旨在与imapclient库一起使用

# ----- Parsing BODYSTRUCTURE into parts dictionaries ----- #

def tuple2dict(pairs):
    """get dict from (key, value, key, value, ...) tuple"""
    if not pairs:
        return None
    return dict([(k, tuple2dict(v) if isinstance(v, tuple) else v)
                 for k, v in zip(pairs[::2], pairs[1::2])])

def parse_singlepart(var, part_no):
    """convert non-multipart into dic"""
    # Basic fields for non-multipart (Required)
    part = dict(zip(['maintype', 'subtype', 'params', 'id', 'description', 'encoding', 'size'], var[:7]), part_no=part_no)
    part['params'] = tuple2dict(part['params'])

    # Type specific fields (Required for 'message' or 'text' type)
    index = 7
    if part['maintype'].lower() == 'message' and part['subtype'].lower() == 'rfc822':
        part.update(zip(['envelope', 'bodystructure', 'lines'], var[7:10]))
        index = 10
    elif part['maintype'].lower() == 'text':
        part['lines'] = var[7]
        index = 8

    # Extension fields for non-multipart (Optional)
    part.update(zip(['md5', 'disposition', 'language', 'location'], var[index:]))
    part['disposition'] = tuple2dict(part['disposition'])

    return part

def parse_multipart(var, part_no):
    """convert the multipart into dict"""
    part = { 'child_parts': [], 'part_no': part_no }

    # First parse the child parts
    index = 0
    if isinstance(var[0], list):
        part['child_parts'] = [parse_part(v, ('%s.%d' % (part_no, i+1)).replace('TEXT.', '')) for i, v in enumerate(var[0])]
        index = 1
    elif isinstance(var[0], tuple):
        while isinstance(var[index], tuple):
            part['child_parts'].append(parse_part(var[index], ('%s.%d' % (part_no, index+1)).replace('TEXT.', '')))
            index += 1

    # Then parse the required field subtype and optional extension fields
    part.update(zip(['subtype', 'params', 'disposition', 'language', 'location'], var[index:]))
    part['params'] = tuple2dict(part['params'])
    part['disposition'] = tuple2dict(part['disposition'])

    return part

def parse_part(var, part_no=None):
    """Parse IMAP email BODYSTRUCTURE into nested dictionary

    See http://tools.ietf.org/html/rfc3501#section-6.4.5 for structure of email messages
    See http://tools.ietf.org/html/rfc3501#section-7.4.2 for specification of BODYSTRUCTURE
    """
    if isinstance(var[0], (tuple, list)):
        return parse_multipart(var, part_no or 'TEXT')
    else:
        return parse_singlepart(var, part_no or '1')

0 个答案:

没有答案