我正在试图找出如何仅获取电子邮件的文本部分。使用下面的代码,我能够得到正文,但总是跟着电子邮件的html,我不需要。如何告诉我的脚本忽略html?
import imaplib
import email
def extract_body(payload):
if isinstance(payload,str):
return payload
else:
return '\n'.join([extract_body(part.get_payload()) for part in payload])
conn = imaplib.IMAP4_SSL("imap.gmail.com", 993)
conn.login("username", "password")
conn.select()
typ, data = conn.search(None, 'UNSEEN')
try:
for num in data[0].split():
typ, msg_data = conn.fetch(num, '(RFC822)')
for response_part in msg_data:
if isinstance(response_part, tuple):
msg = email.message_from_string(response_part[1])
subject=msg['subject']
print(subject)
payload=msg.get_payload()
body=extract_body(payload)
print(body)
typ, response = conn.store(num, '+FLAGS', r'(\Seen)')
finally:
try:
conn.close()
except:
pass
conn.logout()
答案 0 :(得分:0)
您在多部分容器的每个项目上调用get_payload()
,并将它们串在一起。只需遍历多部分容器中的每个有效负载,然后选择您要查找的Content-Type
的负载。