Python提取电子邮件的正文-垃圾字符串

时间:2018-10-16 21:50:43

标签: python email

我正在尝试使用email库获取电子邮件的正文。

我已成功访问服务器,帐户,收件箱和邮件。

def connect(server, user, password):
    m = imaplib.IMAP4_SSL(server)
    m.login(user, password)
    m.select()
    return m 

def read_email(m, emailid):
    resp, data = m.fetch(emailid, "(UID BODY[TEXT])")
    email_body = data[0][1]
    mail = email.message_from_string(email_body)

    # extract email body

    if mail.is_multipart():
        for payload in mail.get_payload():
            print payload.get_payload()
    else:
        print mail.get_payload()

m = connect('outlook.office365.com', credentials.mailusername, 
credentials.mailpassword)
m.select('INBOX', readonly=True)
typ, emailid = m.search(None, header)
read_email(m, emailid[0])

我的.get_payload()结果是

PGh0bWw+DQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0i
dGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjwvaGVhZD4NCjxib2R5IGRpcj0iYXV0byI+DQpI
aSBNYXR0LA0KPGRpdj48YnI+DQo8L2Rpdj4NCjxkaXY+TXkgdHJhaW4gaXMgZHVlIHRvIGFycml2

我已经搜索过,但是找不到我在做什么错。

帮助?

谢谢。

1 个答案:

答案 0 :(得分:2)

您的有效负载使用base64进行编码:

echo "PGh0bWw+DQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjwvaGVhZD4NCjxib2R5IGRpcj0iYXV0byI+DQpIaSBNYXR0LA0KPGRpdj48YnI+DQo8L2Rpdj4NCjxkaXY+TXkgdHJhaW4gaXMgZHVlIHRvIGFycml2" | base64 --decode

产生:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
Hi Matt,
<div><br>
</div>
<div>My train is due to arriv

您可以使用Python base64模块以编程方式进行解码。在某些情况下,您还可以将get_payload()decode=Truedocs)结合使用来自动解码。

base64模块的示例:

python2 -c "import base64; print base64.b64decode('PGh0bWw+DQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9IkNvbnRlbnQtVHlwZSIgY29udGVudD0idGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjwvaGVhZD4NCjxib2R5IGRpcj0iYXV0byI+DQpIaSBNYXR0LA0KPGRpdj48YnI+DQo8L2Rpdj4NCjxkaXY+TXkgdHJhaW4gaXMgZHVlIHRvIGFycml2')"
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body dir="auto">
Hi Matt,
<div><br>
</div>
<div>My train is due to arriv