Question

我正试图从很多HTML电子邮件中获取一个句子。该句子位于每封电子邮件的完全相同的位置（如果您查看源代码，则包括相同的行）。

到目前为止，我已使用imaplib设置与正确邮箱的连接，搜索并获取电子邮件正文。

response_code_fetch, data_fetch = mail.fetch('1', '(BODY.PEEK[TEXT])')
if response_code_fetch == "OK":
    print("Body Text: " + str(data_fetch[0]))
else:
    print("Unable to find requested messages")

但是，我得到一个不连贯的列表，其中包含返回列表的索引[0]的整个电子邮件正文。我已尝试str(data_fetch[0])然后使用splitlines方法，但它不起作用。

我也使用email模块在线找到了以下建议，但它似乎不起作用，因为它打印了else语句。

my_email = email.message_from_string(data_fetch)
body = ""
if my_email.is_multipart():
    for part in my_email.walk():
        ctype = part.get_content_type()
        cdispo = str(part.get('Content-Disposition'))
        print(ctype, cdispo)

# not multipart - i.e. plain text, no attachments, keeping fingers crossed
else:
    print("Email is not multipart")
    body = my_email.get_payload(decode=True)
    print(body)

我不会包含整个结果，因为它很长但是基本上看起来我得到了电子邮件，HTML格式，正文和所有的代码：

Body Text: [(b'1 (BODY[TEXT] {78687}', b'--_av-
uaAIyctTRCxY0f6Fw54pvw\r\nContent-Type: text/plain; charset=utf-
8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n

有谁知道如何从正文中得到一句话？

Answer 1

我认为字符串前面的b使其成为byte literal。如果您在.decode('UTF-8')字符串后加Body Text怎么办？

从电子邮件中提取单行文本

1 个答案: