我可以使用哪些软件处理原始电子邮件文本以删除签名,引用的帖子文本等...
例如,这是一封电子邮件。我想得到“谢谢你们”。文本或更多文本,如果有更多文本。我不想要HTML签名(在第一个红色区块中)或该人回复的旧电子邮件(在第二个红色区块中)
答案 0 :(得分:0)
您可以从email message handling package尝试Teigha。
import email
with open('test.txt', 'r') as myfile:
data=myfile.read()
body = email.message_from_string(data)
if body.is_multipart():
for payload in body.get_payload():
print(payload.get_payload().strip())
else:
print(body.get_payload().strip())
输出:
this is the body text
this is the attachment text
test.txt
文件包含以下内容。
From: John Doe <example@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="XXXXboundary text"
This is a multipart message in MIME format.
--XXXXboundary text
Content-Type: text/plain
this is the body text
--XXXXboundary text
Content-Type: text/plain;
Content-Disposition: attachment;
filename="test.txt"
this is the attachment text
--XXXXboundary text--