Question

在亚马逊SES上，我有一条规则可以将传入的电子邮件保存到S3存储桶中。亚马逊以MIME格式保存这些内容。

这些电子邮件的附件中.txt将在MIME文件中显示为content-type=text/plain，Content-Disposition=attachment ... .txt和Content-Transfer-Encoding=quoted-printable或bases64。

我可以使用python解析它。

我在解压缩.txt文件附件的内容时遇到问题（即content-type: applcation/zip），就像编码不是base64一样。

我的代码：

import base64
s = unicode(base64.b64decode(attachment_content), "utf-8")

抛出错误：

Traceback (most recent call last):
  File "<input>", line 796, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcf in position 10: invalid continuation byte

以下是＆＃34; base64＆＃34;的前几行。 attachment_content中的字符串，其长度为53683 +＆＃34; ==＆＃34;最后，我认为base64的长度应该是4（??）的倍数。所以也许解码失败了，因为压缩正在改变attachment_content，我需要在解码之前/之后进行一些其他操作？我真的不知道......

UEsDBBQAAAAIAM9Ah0otgkpwx5oAADMTAgAJAAAAX2NoYXQudHh0tL3bjiRJkiX23sD+g0U3iOxu
REWGu8c1l2Ag8lKd0V2ZWajM3kLuC6Hubu5uFeZm3nYJL6+n4T4Ry8EOdwCSMyQXBRBLgMQ+7CP5
QPBj5gdYn0CRI6JqFxWv7hlyszursiJV1G6qonI5cmQyeT6dPp9cnCaT6Yvp5Yvz6xfJe7cp8P/k
1SbL8xfJu0OSvUvr2q3TOnFVWjxrknWZFeuk2VRlu978s19MRvNMrHneOv51SOZlGUtMLYnfp0nd

...

我也曾尝试使用＆＃34; latin-1＆＃34;，但得到了胡言乱语。

Answer 1

问题在于，转换后，我正在处理格式化的压缩文件，例如“PK \x03 \x04 \X3C \Xa \x0c ...”，我需要在将其转换为UTF-8 unicode之前将其解压缩。

此代码对我有用：

import email

# Parse results from email
received_email = email.message_from_string(email_text)
for part in received_email.walk():
    c_type = part.get_content_type()
    c_enco = part.get('Content-Transfer-Encoding')

    attachment_content = part.get_payload()

    if c_enco == 'base64':
        import base64
        decoded_file = base64.b64decode(attachment_content)
        print("File decoded from base64")

        if c_type == "application/zip":
            from cStringIO import StringIO
            import zipfile
            zfp = zipfile.ZipFile(StringIO(decoded_file), "r")
            unzipped_list = zfp.open(zfp.namelist()[0]).readlines()
            decoded_file = "".join(unzipped_list)
            print('And un-zipped')

    result = unicode(decoded_file, "utf-8")

从AWS SES

1 个答案: