我需要从电子邮件中删除html,我编写的代码对于其他电子邮件也能正常工作,但是对于来自一个发件人的电子邮件,它返回的字符串却不是HTML,而是大字符串。
更新:我收到的字符串在base64中,但是我的代码仍然只能获取电子邮件的base64部分,而不能获取HTML,因此仍然存在问题。
这是我的代码的样子:
m = imaplib.IMAP4_SSL('imap.mail.yahoo.com')
m.login('xxxxxx', 'xxxxxxxx')
rv, mailboxes = m.list()
if rv == 'OK':
print ("Mailboxes:")
print (mailboxes)
def process_mailbox(m):
rv, data = m.search(None, "ALL")
if rv != 'OK':
print ("No messages found!")
return
for num in data[0].split():
rv, data = m.fetch(num, '(RFC822)')
if rv != 'OK':
print ("ERROR getting message"), num
return
msg = email.message_from_string(data[0][1])
print ('Message %s: %s' % (num, msg['Subject']))
print ('Raw Date:', msg['Date'])
date_tuple = email.utils.parsedate_tz(msg['Date'])
if date_tuple:
local_date = datetime.datetime.fromtimestamp(
email.utils.mktime_tz(date_tuple))
print ("Local Date:"), \
local_date.strftime("%a, %d %b %Y %H:%M:%S")
m.select('MAILBOX', readonly=True)
resp, items = m.search(None, "ALL")
items = items[0].split() # getting the mails id
for emailid in items:
resp, data = m.fetch(emailid, "(RFC822)")
raw_email = data[0][1]
print (raw_email)
通常在这一点上我会收到原始电子邮件,但是这次我所得到的只是一个很大的字符字符串,而且从未使用过实际的HTML:
的Content-Length:9617 X-防病毒:停住(VPS 190503-4,2019年5月3日),入站消息X-防病毒-状态:清洁PHRhYmxlIHN0eWxlPSJmb250LWZhbWlseTogVGFob21hLCBHZW5ldmEsIHNhbnMtc2Vy aWY7IiB3aWR0aD0iNjMwIiBjZWxsc3BhY2luZz0iMCIgY2VsbHBhZGRpbmc9IjEwIj4g PHRib2R5PgogPHRyPgogPHRkPgogPHRhYmxlIHN0eWxlPSJmb250LWZhbWlseTogVGFo b21hLCBHZW5ldmEsIHNhbnMtc2VyaWY7IiB3aWR0aD0iMTAwJSIgY2VsbHNwYWNpbmc9 IjAiIGNlbGxwYWRkaW5nPSIwIiBib3JkZXI9IjAiPiA8dGJvZHk + CiA8dHI + CiA8dGQg d2lkdGg9IjEwMCUiPjxjZW50ZXI + PGEgaHJlZj0iaHR0cHM6Ly9zaG9wLm1lcmNvbGEu Y29tIj48aW1nIHNyYz0iaHR0cHM6Ly9tZWRpYS5tZXJjb2xhLmNvbS9hc3NldHMvaW1h Z2VzL3Nob3Bsb2dvL01lcmNvbGFfTG9nb3YyLnBuZyIgd2lkdGg9IjMxNCIgaGVpZ2h0 PSIzOSIgYm9yZGVyPSIwIiAvPjwvYT48L2NlbnRlcj48L3RkPgogPC90cj4KIDx0cj4K IDx0ZD4KPGhyIHN0eWxlPSJjb2xvcjogI2VjZWNlYzsgd2lkdGg6IDEwMCU7IiAvPjwv DGQ + CiA8L3RyPgogPC90Ym9keT4KIDwvdGFibGU + CiA8L3RkPgogPC90cj4KIDx0cj4K IDx0ZCBzdHlsZT0icGFkZGluZzogMTBweCAzMHB4IDMwcHggMzBweDsiPjxzcGFuIHN0 eWxlPSJmb250LXNpemU6IDE1cHQ7IGZvbnQtd2VpZ2h0OiBib2xkOyBjb2x vcjogIzEy NmFhYTsiPlNoaXBwaW5nIENvbmZpcm1hdGlvbjwvc3Bhbj48YnIgLz48YnIgLz48Yj48 c3BhbiBzdHlsZT0iZm9udC1zaXplOiAxMnB0OyI + RGVhciBQYXRyaWNpYSBTY2hsZXVz bmVyLDwvc3Bhbj48L2I + PGJyIC8 + PGJyIC8 + PHNwYW4gc3R5bGU9ImZvbnQtc2l6ZTog MTJwdDsiPlRoYW5rIHlvdSBmb3IgeW91ciByZWNlbnQgb3JkZXIgZnJvbSA8YSBocmVm PSJodHRwczovL3Nob3AubWVyY29sYS5jb20iPk1lcmNvbGE8L2E + LiBXZSBhcmUgcGxl YXNlZCB0byBpbmZvcm0geW91IHRoYXQgeW91IGFyZSBub3cgb25lIHN0ZXAgY2xvc2Vy IHRvIHRha2luZyBjb250cm9sIG9mIHlvdXIgaGVhbHRoISBZb3VyIG9yZGVyIG51bWJl ciBPMTUwOTMxMDkgaGFzIGJlZW4gc2hpcHBlZCBhbmQgaXMgb24gaXRzIHdheSB0byB5 b3UuPGJyIC8 + PGJyIC8 + VGhlIHNoaXBtZW50IGRldGFpbHMgYXJlIGFzIGJlbG93Ojwv c3Bhbj48YnIgLz48YnIgLz4gPHRhYmxlIHN0eWxlPSJmb250LXNpemU6IDEycHQ7IGZv bnQtZmFtaWx5OiBUYWhvbWEsIEdlbmV2YSwgc2Fucy1zZXJpZjsgdGV4dC1hbGlnbjog bGVmdDsiIHdpZHRoPSIxMDAlIiBjZWxsc3BhY2luZz0iMCIgY2VsbHBhZGRpbmc9Ijci
答案 0 :(得分:1)
由于您能够从原始数据中创建Message对象,因此可以使用其功能来提取所需的信息。
from email import policy
# Set the policy to create an EmailMessage instance.
msg = email.message_from_string(data[0][1], policy=policy.default)
# Get the part most likely to be the preferred body.
body = msg.get_body()
# get_content() will automatically decode from base64 or quoted-printable.
print(body.get_content())
在创建消息对象时将策略设置为policy.default
可以确保返回一个EmailMessage
实例-该对象提供了get_body
和get_content
方法。
返回最适合作为邮件“正文”的MIME部分。
您可以提供一个子类型列表来指导其行为。